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Several  authors  have  recently  had  apparent  success  in  applying 
continuous-time  point  process  models  to  daily  rainfall  observation 
sequences.  In  this  work  it  is  shown  that  major  problems  arise  when  the 
observation  sequence  represents  cumulative  rainfall  amounts  over  a 
period  (e.g.,  one  day)  which  is  on  the  order  of  the  process 
interarrival  time.    In  particular,  the  use  of  continuous-time  point 
process  models  for  daily  rainfall  occurrences  may  result  in  incorrect 
inferences  about  the  underlying  rainfall  generating  mechanisms.  This 
was  confirmed  by  the  statistical  analysis  of  six  daily  rainfall 
records  from  diverse  climatologic   regimes  throughout  the  U.S. 
(Snoqualmie  Falls,  Washington;  Roosevelt,  Arizona;  Austin,  Texas; 
Miami,  Florida;  Philadelphia,  Pennsylvania;  and  Denver,  Colorado).  In 
addition,  the  use  of  continuous-time  point  process  models  for 
generation  of  daily  rainfall  sequences  leads  to  serious  upward  biases 
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in  the  event-interarrival  times  and  in  dependence  structures  that  may 
be  much  different  than  those  of  the  apparent  rainfall  generating 
process. 

In  this  work,  a  discrete-time  point  process  model  has  been 
developed  and  its  structural  properties  derived.     In  the  proposed 
process  the  sequence  of  times  between  events  is  formed  through 
sampling  from  two  geometric  distributions,  according  to  transition 
probabilities  specified  by  a  Markov  chain.    This  process  belongs  to 
the  class  of  semi-Markov  (or  Markov  renewal)  processes  and  is  a 
non-renewal,  clustered  (relative  to  the  Bernoulli)   process  which 
reduces  to  a  renewal  process  with  a  mixture  distribution  for  the 
interarrival  times  as  a  special  case.    Several  methods  for  fitting  the 
proposed  model  have  been  studied  and  an  approximate  maximum  likelihood 
estimation  method  has  been  found  to  perform  adequately,  especially  for 
daily  rainfall  structures  with  not  very  significant  autocorrelation 
structures. 

The  semi-Markov  model  was  fitted  to  the  daily  rainfall 
occurrences  of  the  Snoqualmie  Falls  and  Roosevelt  stations,  both  on  a 
monthly  and  seasonal  basis.    The  fit  of  the  model  was  assessed  by  the 
preservation  of  selected  statistical  properties  of  the  series  which 
were  not  used  directly  in  the  estimation.     It  was  shown  that  the 
fitted  model  gave  a  theoretical  spectrum  of  counts  surprisingly  close 
to  the  empirical  one.     Also,  the  model  was  quite  successful  in 
preserving  the  distributional  properties  of  the  cumulative  rainfall 
amounts  over  longer  periods  of  time,  particularly  for  the  Snoqualmie 
Falls  station. 
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CHAPTER  1 
INTRODUCTION 


It  was  six  men  of  Indostan 
To  learning  much  inclined, 

Who  went  to  see  the  Elephant 
(Though  all  of  them  were  blind). 

That  each  by  observation 
Might  satisfy  his  mind. 

Rainfall  is  the  result  of  a  complex  atmospheric  process  evolving 

continuously  over  space  and  time.     At  any  time,  rainfall  fields  are 

characterized  by  their  areal   extent  and  their  spatially  variable 

intensity.    Austin  and  Houze  (1972)  and  Hobbs  and  Locatelli  (1978)  have 

classified  rainfall  fields  according  to  their  areal  extent  and  lifetimes 

as  synoptic,  large  mesoscale,  small  mesoscale  and  rain  cells.  Synoptic 

4  2 

rainfall  fields  cover  areas  on  the  order  of  10    km    and  have  a  lifetime 

of  one  to  several  days;  large  mesoscale  fields  cover  areas  of  10^  -  10*^ 
2 

km    and  have  a  lifetime  of  several  hours;  small  mesoscale  fields  have 

2  2 

areal  extent  of  10  -  10    km    and  a  lifetime  of  approximately  one  hour; 

2 

rain  cells  have  areal  extent  of  1  -  10  km    and  lifetimes  of  a  few  minutes 
to  1/2  hour.    The  system  of  rainfall  fields  is  hierarchical  in  the  sense 
that  larger  rainfall  fields  usually  contain  one  or  more  of  the  smaller 
ones.  The  continuous  movement,  build-up,  and  dissipation  of  rainfall 
fields  determines  the  rainfall  intensity  variations  in  space  and  time. 

Space-time  modeling  of  an  observed  rainfall  sequence  at  a  point 
based  on  a  mathematical   description  of  the  underlying  atmospheric 


1 


2 


processes  would  be  an  extremely  complicated,  if  not  impossible,  task. 
The  need  for  mathematically  tractable  descriptions  of  rainfall  for 
operational   purposes,  i.e.,  forecasting  for  day-to-day  operation  of 
hydrologic  systems,  has  motivated  treatment  of  rainfall  as  a  stochastic 
process.    Approaches  to  the  space-time  stochastic  modeling  of  rainfall 
have  recently  been  suggested  by  Waymire  et  al.  (1984)  and  Kavvas  and  Herd 
(1984).    In  the  present  work,  only  the  time  variability  of  rainfall, 
i.e.,  the  characterization  of  a  precipitation  observation  sequence  at  a 
single  station,  is  considered. 

Point  rainfall  is  the  precipitation  intercepting  a  small  area  such 
as  the  opening  of  a  rain  guage;  it  may  be  treated  as  a  continuous-time 
intermittent   process,   say  with    intensity     E,{t).  Precipitation 
measurements  are  recorded  for  cumulative  amounts  over  discrete  time 
intervals  such  as  minutes,  hours,  or  days.    Let  {Y^.}-p,  i  =  1,2,3,... 
denote  the  discrete  sequence  of  rainfall  observations  over  an  arbitrary 
time  interval  T.    The  continuous  process  ^(t)  is  related  to  the  discrete 
process  {Y.}-p  by 


where  t^.  -  t^._^  =  T  is  the  time  of  measurement.    Figure  1.1  illustrates 
the  relationship  between  C(t)  and  Y^^-j-^:    the  continuous  process  ^(t)  is 
integrated  over,  say,  daily  time  intervals  to  give  the  sequence  of  daily 
data  ^Y.}-p,  where  T  =  1  day. 


i-1 


(1.1) 


Rainfall 
intensity 
(mm/sec) 


Figure  1.1    Continuous  rainfall  process  C(t)    and  discrete  hourly  and  daily 
rainfall  sequences  {Y.}^  . 
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Rodriguez-Iturbe  et  al .  (1984)  view  z,{t)  as  "a  generalized  stochastic 
process"  representing  the  instantaneous  rate  of  rainfall.    By  postulating 
several  continuous-time  models  for  C(t) ,  they  have  derived  distributional 
properties  of  the  discrete  accumulated  amounts  for  an  arbitrary  time 

scale  T.     This  approach  raises  some  fundamental  questions,  as  Diggle 
(1984)  has  recently  pointed  out,  specifically  what  is  a  suitable  class  of 
models  for  C(t),  and  how  can  inferences  about  C(t)  be  made,  given  data  in 
the  form  of  daily  or  hourly  accumulated  amounts?    Rodriguez-Iturbe  et  al. 
(1984)  assessed  the  validity  of  possible  models  for  5(t)  by  comparing 
parameters  estimated  from  hourly  and  daily  data  with  the  theoretical 
parameters  of  {Y^}^,  T  =  1  hour  and  1  day,  derived  for  several  candidate 
models  for  C(t).     This  approach,  however,  is  primarily  of  theoretical 
interest  because  it  does  not  suggest  a  model  for  the  observed  discrete 
rainfall  sequences  ^Y^.  }y  but  rather  for  the  unobserved  continuous  process 
?(t),  and  the  derived  distributional  properties  of  ^Y^}-!-  do  not  lead  to  a 
parsimonious  representation  of  the  discrete  process. 

In  this  work  a  somewhat  different  approach  is  suggested.  The 
appropriateness  of  several  model  structures  for  the  discrete  process  (Y^. }-|. 
has  been  examined,  with  emphasis  on  a  one  day  interval.    The  ultimate  goal 
is  to  derive  a  realistic  parameter-parsimonious  model  suitable  for  the 
analysis  and  synthesis  of  daily  rainfall  sequences. 

Although  daily  rainfall  observation  sequences  are  only  one  possible 
discrete  aggregation  of  the  time-continuous  rainfall  process,  the  daily 
scale  is  of  special  interest  for  several  reasons.    Many  water  resource 
systems  are  operated  on  a  daily  time  step.     For  example,  operational 
decisions  for  small  reservoirs  for  water  supply  and  irrigation  scheduling 
(e.g.,  Ramirez-Rodriguez  and  Bras,  1982)  are  often  made  on  a  daily  time 
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scale,  and  therefore  an  adequate  mathematical  description  of  the  daily 
rainfall  input  is  necessary.     More  generally,  though,  one  day  may  be 
considered  as  the  upper  limit  of  event  scale  for  precipitation;  larger 
scale  precipitation  sequences  no  longer  reflect  individual  precipitation 
events,  and  take  on  a  fundamentally  different  statistical  structure. 
Another  reason  for  modeling  daily  rainfall  is  that  most  U.S.  rainfall 
stations  are  cooperative,  that  is,  the  data  are  not  collected  by  the 
National  Weather  Service.     Most  cooperative  stations  report  daily 
precipitation  totals.    Manned  or  remote  National  Weather  Service  stations 
are  much  fewer  in  number;  generally,  it  is  these  stations  that  collect 
hourly  or  shorter  increment  data.    It  should  be  emphasized  that  the  model 
developed  herein  is  not  restricted  to  the  daily  time  scale;  it  is  expected 
that  much  of  the  work  will  also  be  applicable  to  smaller  time  scales  such 
as  hourly.    Nonetheless,  the  emphasis  in  this  work  is  on  the  daily  time 
scale. 

The  stochastic  structure  of  daily  rainfall   occurrences  has  been 
extensively  studied  over  the  past  two  decades.    A  classification  of  the 
modeling  approaches  and  a  literature  review  of  the  models  proposed  are 
given  in  Chapter  2.    This  work  concentrates  on  only  one  approach,  namely, 
the  point  process  approach.    A  point  process  is  defined  as  a  sequence  of 
events  completely  characterized  by  the  location  (in  time  or  space)  of  the 
events.    The  daily  rainfall  occurrence  process  may  be  viewed  as  a  point 
process  in  which  an  event  takes  place  any  time  the  cumulative  rainfall 
amount  over  a  one-day  period  exceeds  a  specified  threshold  value,  as  for 
example  0.01  inches.    Under  the  above  definition  of  event,  and  given  that 
a  day  can  be  either  dry  (no  rain)  or  wet  (rain  exceeding  a  threshold 
value),  a  point  process  model  for  daily  rainfall  occurrences  will  identify 
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only  the  "state"  of  each  day  i.e.,  dry  or  wet.    In  that  sense,  the  point 
process  modeling  approach  for  daily  rainfall  is  conceptually  equivalent  to 
the  discrete  binary  series  approach  in  which  a  sequence  of  zeroes  and  ones 
(for  no  rain  and  rain)  is  formed  and  subsequently  modeled.    The  important 
difference,  however,  between  the  two  approaches  is  that  the  theory  of 
point  processes  permits  construction  of  models  with  much  more  flexible 
dependence  structures,  as  compared  with  the  structures  that  can  be 
formulated  under  the  theory  of  binary  time  series. 

An  example  of  a  discrete  binary  series  model  is  a  Markov  chain,  which 
has  been  used  by  a  number  of  authors  to  model  daily  rainfall  occurrences 
(e.g.,  Chin,  1977).    Newer  developments  in  discrete  binary  models  include 
the  work  of  Chang  et  al .  (1984)  who  introduced  the  discrete  autoregressive 
moving  average  (DARMA)  models  for  daily  rainfall  and  applied  them  to  the 
daily  rainfall   sequences  of  nine  stations  in  Indiana.     While  newer 
approaches  in  discrete  binary  models,  such  as  the  DARMA  models,  may  meet 
some  of  the  inadequacies  of  Markov  chains  for  daily  rainfall  occurrences, 
these  models  lead  to  complicated  recursive  formulas  for  the  distributional 
properties  of  the  interarrival  times  (e.g.,  Chang  et  al . ,  1982).    It  is 
the  author's  feeling    that  the  point  process  modeling  methodology  provides 
much  more  elegant  mathematical  formulations  of  a  stochastic  process,  and 
for  this  reason  discrete  binary  models  are  not  considered  further. 

Point  process  models  for  the  areal  distribution  of  rainfall  were 
first  introduced  by  LeCam  (1961).    Later  Kavvas  and  Delleur  (1975)  applied 
the  point  process  methodology  to  daily  rainfall  occurrences.    The  powerful 
theory  of  point  processes  was  illustrated  in  the  hydrological  literature 
by  Waymire  and  Gupta  (1981a,  b,  and  c)  in  a  series  of  three  papers.  The 
suitability  of  the   flexible   point    process   model    structures  for 
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small -time-increment  rainfall  sequences  encouraged  further  study  and  Smith 
(1981)  proposed  a  different  model  for  daily  rainfall  occurrences.  Details 
of  these  models  will  be  given  in  Chapter  2.     The  important  point  to  be 
made  here  is  that  previous  studies  have  applied  time-continuous  point 
process  models  to  the  daily  rainfall  occurrences.    However,  the  point 
process  of  daily  rainfall  occurrences  is  discrete  in  time.  The 
discreteness  stems  from  the  definition  of  the  event  as  a  day  with  rainfall 
above  a  threshold  value.     It  should  be  noted  here  that  throughout  the 
discussion  that  follows  we  have  used  the  short  term  continuous  (or 
discrete)  point  process  instead  of  the  accurate  term  continuous-time  (or 
discrete-time)  point  process,  primarily  for  convenience. 

In  this  work  it  will  be  demonstrated  that  continuous  point  process 
models  are  not  operationally  useful  for  daily  rainfall  and  that  discrete 
point  process  models  are  needed  instead.    In  addition,  it  will  be  shown 
that  the  theory  of  continuous  point  processes  is  not  appropriate  for 
modeling  daily  rainfall  occurrences  and  that  inferences  made  about  the 
underlying  rainfall  generating  mechanisms  by  comparing  sample  properties 
of  daily  rainfall  occurrences  to  the  independent  Poisson  process  may  be 
misleading.    In  view  of  the  above,  a  discrete  point  process  methodology 
will  be  suggested  and  a  discrete  point  process  model  with  demonstrated 
flexibility  introduced  and  applied  to  representative  daily  rainfall 
occurrence  structures.     Methods  for  fitting  this  model  will  also  be 
studied.    Finally,  the  model  will  be  coupled  with  a  model  for  the  non-zero 
daily  rainfall  amounts  to  give  an  operational,  parsimonious  model  for 
analysis  and  synthesis  of  daily  rainfall  sequences.    Further,  it  will  be 
shown  that  such  a  model  may  be  able  to  preserve  the  distributional 
properties  of  the  cumulative  rainfall  amounts  over  periods  of  specified 
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length,  e.g.,  a  week  or  month.    This  is  an  important  property  of  a  daily 
rainfall  model  especially  when  it  is  used  for  rainfal 1 -runoff  studies, 
where  mass  balance  over  long  periods  of  time  is  desired. 

In  summary,  this  dissertation  is  structured  as  follows.    In  Chapter 
2,  a  classification  and  brief  review  of  available  daily  rainfall  models 
are  given.    The  inappropriateness  of  the  continuous  point  process  models 
for  the  discrete  daily  rainfall  occurrences  is  demonstrated  in  Chapter  3. 
In  Chapter  4,  the  statistical  analysis  of  six  daily  rainfall  records  with 
respect  to  the  rainfall  occurrences  and  amounts  is  presented.    In  Chapter 
5,  a  discrete  point  process  model   is  developed  and  its  statistical 
properties  are  derived.     Methods  for  fitting  the  developed  model  are 
suggested  and  compared  in  Chapter  6.     In  Chapter  7,  the  discrete  point 
process  is  fitted  to  the  daily  rainfall  occurrences  of  two  stations  and 
coupled  with  a  model  for  the  non-zero  daily  rainfall  amounts.  The 
satisfactory  performance  of  the  model  is  assessed  by  checking  the  extent 
to  which  several   rainfall   statistics  are  preserved.     The  summary, 
conclusions,  and  recommendations  for  further  research  are  given  in  Chaptei 
8. 


CHAPTER  2 
LITERATURE  REVIEW 


The  First  approached  the  Elephant, 
And  happening  to  fall 

Against  his  broad  and  sturdy  side. 
At  once  began  to  bawl : 

"God  bless!  but  the  Elephant 
Is  very  1  ike  a  wal 1 ! " 

Due  to  the  intermittency  of  small-time-increment  (i.e.,  hourly  or 
daily)  rainfall  processes,  standard  time-series  analysis  methods  are  not 
applicable.    Instead,  the  most  commonly  used  approach  to  modeling  daily 
rainfall   is  to  model  the  rainfall  occurrences  separately  from  the 
non-zero  rainfall  amounts,  and  then  superimpose  the  two  models.  This 
chapter  classifies  the  existing  daily  rainfall  occurrence  and  amounts 
models  and,  under  each  category,  a  review  of  selected  work  is  given. 
For  supplementary  review  papers,  the  reader  is  referred  to  Roldan  and 
Woolhiser  (1982),  Woolhiser  and  Roldan  (1982),  Court  (1979),  and  Waymire 
and  Gupta  (1981a). 

2.1    Models  for  the  Daily  Rainfall  Occurrences 
2.1.1    The  "Wet-Dry  Spell"  Approach 

In  this  approach  any  uninterrupted  sequence  of  wet  days  (i.e.,  days 
with  total  rainfall  above  a  specified  threshold  value)  defines  an  event 
(see  Fig.  2.1a).    Such  an  occurrence  model  is  completely  specified  by 
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(A)  "Wet -dry  spell"  approach 
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(B)   Binary  discrete  series  approach 
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(C)   Point  process  approach 
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Figure  2.1    Different  approaches  to  modeling  daily  rainfall  occurrences. 
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the  probability  laws  of  the  length  of  the  wet  periods  (storm  duration) 
and  the  length  of  the  dry  periods  (time  between  storms). 

This  model  structure,  with  exponential  distributions  for  the 
lengths  of  the  dry  and  wet  periods,  was  used  by  Thom  (1958)  and  Green 
(1964),  among  others.     Grace  and  Eagleson  (1966)   used  a  Weibull 
distribution  for  the  wet-period  lengths  and  applied  the  model  to  short 
time  increment  (on  the  order  of  minutes  and  hours)  rainfall  occurrences. 
Todorovic  and  Yevjevich  (1969)  and  Eagleson  (1978)  conducted  later 
studies  using  this  modeling  approach. 

In  probabilistic  terminology  the  above  model  is  an  alternating 
renewal  model.     The  term  renewal  stems  from  the  implied  independence 
between  the  dry  and  wet  period  lengths,  and  the  term  alternating  is  used 
to  indicate  that  a  wet  (dry)  period  is  always  followed  by  a  dry  (wet) 
period,  i.e.,  no  transition  to  the  same  state  is  possible.     In  many 
early  studies,  such  a  model  with  exponential  distributions  for  the  dry 
period  lengths  was  referred  to  as  a  Poisson  model.     This  is  an 
inaccurate  terminology  resulting  from  the  assumption  that  an  event, 
which  in  this  case  corresponds  to  a  wet  period,  occurs  instantaneously 
at  the  middle  or  end  of  the  wet  period. 

A  recent  study  of  an  alternating  renewal  model  for  daily  rainfall 
was  reported  by  Galloy  et  al .  (1981).     They  used  discrete  negative 
binomial   distributions  for  the  wet   and  dry  period   lengths  and 
implemented  the  theory  of  point  processes  to  derive  the  statistical 
properties  of  intervals  (times  between  events)  and  counts  (number  of 
events  in  a  time  interval). 

The  main  problem  with  the  wet-dry  spell  approach  to  modeling  daily 
or  other  smal 1 -time- increment  rainfall   lies  in  the  modeling  of  the 
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rainfall  amounts.    Due  to  the  definition  of  an  event  as  an  uninterrupted 
sequence  of  wet  days,  the  cumulative  rainfall  amounts  corresponding  to 
each  event  are  conditional  on  the  wet-period  length.  Therefore, 
conditional  probability  distributions  have  to  be  fitted  to  the  amounts. 
This  can  pose  problems,  especially  for  events  of  extreme  duration,  where 
not  many  points  are  available  for  identification  and  fitting  of  a 
probability  density  function. 
2.1.2    The  Binary  Discrete  Time-Series  Approach 

The  daily  rainfall  series  consists  of  either  rainy  or  dry  days,  and 
therefore  can  be  viewed  as  a  binary  series  of  zeroes  and  ones,  with  zero 
corresponding  to  a  dry  day,  and  one  to  a  wet  day  (see  Fig.  2.1b.).  A 
probabilistic  model  is  then  sought  which  can  adequately  describe  this 
sequence  of  zeroes  and  ones.    Alternatively,  this  binary  process  can  be 
thought  of  as  being  formed  by  a  sequence  of  Bernoulli  trials  i.e., 
repetitive  trials  without  replacement  (see,  for  example.  Feller,  1968), 
with  two  possible  outcomes,  zero  and  one.    The  outcomes  can  be  either 
independent  (giving  rise  to  a  Bernoulli  process),  or  dependent  (giving 
rise,  for  example,  to  a  Markov  chain).     The  independent  Bernoulli 
process  is  not  adequate  to  describe  the  dependence  present  in  the  daily 
rainfall  occurrences  (see,  for  example.  Smith  and  Schreiber,  1973). 
Markov  chain  models  are  the  simplest  models  with  a  dependence  structure 
and  have  been  extensively  used  for  modeling  daily  rainfall. 

Markov  chains.    A  Markov  chain  is  a  sequence  of  discrete  random 
variables,  X^,  and  is  said  to  be  of  order  k  if  k  is  the  smallest 
positive  integer  such  that  the  following  equation  of  conditional 
probabilities  is  satisfied  for  all  n: 
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P{XjX^_..  i  =  1.  ...}  =  Pi\\\_^,  ....  X^.^}  (2.1) 

A  complete  treatment  of  the  theory  of  Markov  chains  can  be  found  in  Cox 
and  Miller  (1965),  Parzen  (1962),  and  Qinlar  (1975),  among  others.  A 
two-state  Markov  chain  (appropriate  for  the  zero-one  rainfall  occurrence 
process)  is  completely  specified  by  the  transition  probability  matrix: 


P  = 


1-Pi 


where  Pg  is  the  probability  of  a  dry  day  following  a  dry  day,  and  p^  is 
the  probability  of  a  wet  day  following  a  wet  day. 

Markov  chains  have  been  extensively  used  for  modeling  daily 
rainfall   occurrences.      Gabriel    and  Neumann   (1957,1962)    used  a 
first-order  homogeneous  (i.e.,  constant  parameters)  Markov  chain  for  the 
winter  daily  rainfall  occurrences  at  Tel-Aviv,  while  Caskey  (1963)  and 
Weiss  (1964)  used  a  non-homogeneous  (i.e.,  time  varying  parameters) 
Markov  chain  for  several  stations  in  the  northern  U.S.     Hopkins  and 
Robillard  (1964)  used  a  first  order  Markov  chain  for  the  daily  rainfall 
occurrences  in  Canada  and  found  that  it  was  not  adequate  to  describe  the 
months  with  few  rainy  days.     Feyerherm  and  Bark  (1967)  showed  the 
inadequacy  of  a  first-order  Markov  chain  in  describing  the  higher-order 
dependence  structure  present  in  daily  rainfall,  and  they  proposed  a 
second-order  Markov  chain  for  the  daily  rainfall  occurrences  at  Indiana, 
Iowa,  and  Kansas.    Wiser  (1965)  and  Green  (1965)  also  concluded  that  the 
geometric  memory  of  the  first-order  Markov  chain  is  not  adequate  to 
describe  long  droughts  or  long  wet  spells.  Smith  and  Schreiber  (1973) 
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found  a  non-homogeneous  first-order  Markov  chain  superior  to  an 
independent  Bernoulli  model  for  the  seasonal  thunderstorm  rainfall  in 
the  southwest  U.S.     Woolhiser  and  Pegram  (1979)  studied  Markov  chain 
models  with  seasonally  varying  parameters  using  Fourier  series. 

In  deciding  the  order  of  a  Markov  chain,  Tong  (1975)  used  the 
Akaike  Information  Criterion  (AIC),  while  Hoel    (1954)  presented  a 
likelihood  ratio  goodness  of  fit  test.     Chin  (1977)  identified  the 
Markov  chain  orders  of  25-year  daily  rainfall  records  in  the  United 
States,  using  the  AIC,  and  illustrated  their  dependence  on  the  season 
and  geographical  location. 

For  the  simultaneous  modeling  of  daily  rainfall  occurrences  and 
amounts,  multiple-state  Markov  chains  have  also  been  considered.  Khanal 
and  Hamrick  (1974)  used  a  14-state  Markov  chain  model  for  each  month  of 
the  year,  for  daily  rainfall  sequences  from  Florida.    Haan  et  al .  (1976) 
separated  the  year  into  four  seasons  and  used  a  seven-state  first-order 
Markov  chain  for  the  daily  rainfall  in  Kentucky.    They  assumed  uniform 
distributions  for  the  rainfall  amounts  in  all  but  the  last  state,  in 
which  a  shifted  exponential  was  found  more  appropriate  due  to  the  larger 
variability  in  the  amounts.    Carey  and  Haan  (1978)  used  a  three-state 
first-order  Markov  chain  with  two  different  Gamma  distributions  for  the 
amounts  in  the  two  wet  states,  which  were  further  combined  to  the  same 
pooled  distribution  due  to  the  large  number  of  parameters  in  their 
twelve-season  model. 

In  conclusion,  Markov  chain  models  provide  a  simple  mathematical 
representation  of  the  daily  rainfall  occurrence  process  which  may  be 
adequate  for  some  specific  sites  and  seasons.    However,  their  Markovian 
structure  cannot  describe  the  long  term  persistencies  (i.e.,  long  wet  or 
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dry  spells)  and  the  effect  of  clustering  (i.e.,  higher  likelihood  of 
having  an  event  due  to  an  event  at  a  previous  time)  present  in  the 
short-time- increment  rainfall  occurrences.    Long  term  persistence  in  the 
daily  rainfall  may  be  caused,  for  example,  by  cyclonic  activity 
persisting  during  certain  seasons  (Petterssen,  1969),  and  clustering  may 
be  the  result  of  frontal  thunderstorms  with  a  relatively  long  life  cycle 
(Kavvas  and  Delleur,  1975). 

Discrete  Autoregressi ve  Moving  Average   (PARMA)   models.  A 
DARMA(p,q)  model,  where  p  is  the  order  of  the  autoregressi ve  and  q  the 
order  of  the  moving  average  component,  is  a  sequence  {X^}  formed  by  a 
probabilistic  combination  of  elements  of  a  sequence  {Y^}  which  is 
independent  and  identically  distributed  (i.i.d.).    For  the  binary  DARMA 
models  (Y^}  is  assumed  to  be  i.i.d.  with  a  Bernoulli  distribution,  i.e., 
P(Y^  =  0)  =  tTq,  P(Y^  =  1)  =  tt^  ,  and  '^l  "  ^-  illustration 

purposes,  the  DARMA(1,0)  and  DARMA  (0,1)  models  are  defined  by  a 
sequence  {X^}  such  that 

{Y     with  probability  9 
(2.3) 
^n-1  ^^^^  probability  1-B 

(X    ,  with  probabil  ity  <j) 
(2.4) 
Y^    with  probability  1-0  . 

For  further  details  on  these  models  and  derivation  of  their  statistical 
properties,  see  Jacobs  and  Lewis  (1978a, b)  and  Chang  et  al.  (1982). 

DARMA  models  for  daily  rainfall  were  first  used  by  Buishand  (1978) 
to  analyze  wet-dry  spells  in  the  Netherlands.    Subsequently,  Chang  et 
al.  (1982,  1984)  applied  DARMA  models  to  daily  rainfall  sequences  in 
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Indiana.  They  derived  the  probability  density  functions  of  the  wet  and 
dry  period  lengths  as  functions  of  the  DARMA  model  parameters  (J),  9,  and 
the  marginal  distribution  function  ir  =  (ttq,  tt^^). 

Although  DARMA  models  may  be  an  improvement  over  Markov  chains,  in 
the  sense  that  they  can  accommodate  longer  term  persistence  in  the 
series  in  a  more  parsimonious  way  than  a  high-order  Markov  chain,  their 
linear  structure  is  still  not  able  to  describe  the  clustered  short-term 
dependence  known  to  be  present  in  the  daily  rainfall  occurrences  (Kavvas 
and  Delleur,  1975).    Also,  their  mathematical  framework  seems  to  permit 
derivation  only  of  interval  properties  (i.e.,  probability  distributions 
of  run  lengths)  and  not  of  counting  properties  (i.e.,  distributions  of 
number  of  events  in  a  time  interval).    Given  that  analysis  of  the  second 
order  properties  of  intervals  and  counts  are  not,  in  general,  equivalent 
(see,  for  example.  Cox  and  Lewis,  1978),  it  is  advantageous  to  be  able 
to  use  both  for  model  identification  and  fitting,  and  this  can  be 
effectively  done  in  the  point  process  mathematical  framework. 
2.1.3    The  Point-Process  Approach 

By  defining  an  event  as  the  occurrence  of  a  day  with  a  total 
rainfall  amount  exceeding  a  specified  threshold  (i.e.,  the  occurrence  of 
a  wet  day),  the  sequence  of  daily  rainfall  forms  a  point  process.  With 
the  above  definition  of  events,  a  wet  period  of  several  days  is  treated 
as  a  group  of  instantaneous  rainfall   events  occurring  at  one-day 
intervals  and,  therefore,  the  interarrival  times  are  positive  integer 
values  (1,  2,  3,  ...  days).    Such  a  point  process  is  discrete.    A  major 
issue,  which  is  deferred  to  Chapter  3,  is  how  to  accommodate  this 
feature  within  the  framework  of  continuous  point  processes.    The  present 
discussion  is  limited  to  continuous  point  processes. 
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The  theory  of  continuous  point  processes  has  been  studied  by  Cox 
and  Lewis  (1978),  Cox  and  Isham  (1980),  ginlar  (1975),  Parzen  (1962), 
Lewis  (1972),  Vere-Jones  (1970),  and  others.    Waymire  and  Gupta  (1981b, 
c)  give  an  excellent  review  of  the  theory  of  point  processes  and  relate 
it  to  the  stochastic  modeling  of  hydrologic  series. 

The  simplest  continuous  point  process  is  the  Poisson  process,  whose 
formal  definition  and  properties  can  be  found  in  many  probability  theory 
texts  (see  for  example  ^inlar,  1975,  Ch.4).    In  a  Poisson  process,  the 
times  between  events  are  independent  and  identically  distributed 
(i.i.d.)  random  variables  having  an  exponential  distribution,  and  the 
number  of  events  in  a  time  interval  t  is  an  i.i.d.   random  variable 
having  a  Poisson  distribution.     The  non-homogeneous  (time-varying 
parameters)  Poisson  process  has  been  applied  by  Todorovic  and  Yevjevich 
(1969)  and  Gupta  and  Duckstein  (1975)  to  the  modeling  of  rainfall 
occurrences. 

Kavvas  and  Delleur  (1975)  observed  that  the  daily  rainfall 
occurrences    in    Indiana    exhibit    a    clustering   which   might  be 
satisfactorily  modeled  by  the  class  of  Poisson  cluster  models  (e.g..  Cox 
and  Isham,  1980,  Ch.3)  and  in  particular  by  the  Neyman-Scott  (N-S) 
models.    A  N-S  process  is  a  two-level  process.    At  the  primary  level, 
the  rainfall  generating  mechanisms  (RGM)  occur  according  to  a  Poisson 
model  with  rate  of  occurrence  h^  (i.e.,  mean  interarrival  time  1/hg). 
Each  RGM  gives  rise  to  a  group  of  rainfall  events,  and  each  of  these 
groups  is  called  a  cluster.     Within  each  cluster,  the  occurrence  of 
events  is  completely  specified  by  a  distribution  for  the  number  of 
events  and  a  distribution  for  their  positions  relative  to  the  cluster 
centers.     Kavvas  and  Delleur  (1975)  assumed  a  geometric  distribution 
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with  parameter  p  for  the  number  of  rainfall  events  in  a  cluster  and  an 
exponential  distribution  with  parameter  0  for  the  distances  of  events 
from  their  cluster  centers.    For  these  distributions,  the  final  observed 
process  has  a  rate  of  occurrence  m  =  hg/p.     Applications  of  the  N-S 
model  include  modeling  of  the  areal  clustering  of  rainfall  (LeCam,  1961) 
and  modeling  of  earthquake  occurrences  (Vere-Jones,  1970). 

Smith  (1981)  introduced  another  point  process  model,  namely  a 
doubly  stochastic  Poisson  model,  to  describe  the  clustering  observed  in 
the  daily  rainfall  occurrences  of  the  summer  season  (July  to  October) 
rainfall  in  the  Potomac  river  basin.     In  a  doubly  stochastic  Poisson 
model  (also  known  as  a  Cox  model)  the  rate  of  occurrence  of  the  process 
alternates  between  two  states,  one  zero  and  the  other  positive.  During 
periods  when  the  intensity  is  zero,  no  events  can  occur.    Smith  and  Karr 
(1983)  assumed  that  during  periods  with  positive  intensity,  events  occur 
according  to  a  Poisson  process  with  rate  of  occurrence  X,  and  that  the 
sequence  of  states  visited  forms  a  Markov  chain.     This  model  is  a 
renewal  model  (i.e.,  interarrival  times  are  independent)  and  was  termed 
the  RCM  model  (Renewal  Cox  model  with  Markovian  intensity). 

In  summary,  two  main  classes  of  continuous  point  process  models 
(namely  the  N-S  model  and  the  RCM  model)  have  been  studied  for  the  daily 
rainfall  occurrence  process.    Both  of  these  models  are  overdispersed 
relative  to  Poisson,  that  is  the  variance  of  the  number  of  events  in  an 
arbitrary  time  interval  is  greater  than  the  mean  number  of  events  in 
that  interval,  as  compared  to  the  Poisson  model  in  which  the  variance  is 
equal  to  the  mean.    Our  analysis  has  pointed  out  that  structures  of 
daily  rainfall  occurrences  that  are  underdispersed  relative  to  Poisson 
are  possible  (more  regular  occurrence  of  events  than  that  of  a  Poisson 
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process) ,  a  feature  that  cannot  be  reproduced  by  either  the  N-S  or  the 
RCM  models.     More  significant,  however,  is  the  question  of  whether 
continuous  models  are  appropriate  for  modeling  discrete  daily  rainfall 
occurrences.    This  is  an  especially  significant  issue  when  the  time 
scale  is  daily,  since  the  observation  sequence  represents  cumulative 
rainfall  amounts  over  a  period  (i.e.,  one  day)  which  is  on  the  order  of 
the  process  interarrival  time.    In  Chapter  3,  it  will  be  shown  that  the 
more  natural  way  to  proceed  is  to  model  the  daily  rainfall  occurrence 
process  as  a  discrete  point  process.    First,  however,  a  review  of  recent 
work  on  modeling  rainfall  amounts  is  given. 

2.2    Models  for  the  Non-Zero  Daily  Rainfall  Amounts 
If  the  non-zero  daily  rainfall  amounts  process  is  independent,  then 
it  is  completely  characterized  by  its  marginal   probability  density 
function  (pdf).    The  marginal  probability  distributions  most  commonly 
used  are  the  following: 

(1)  The  exponential  distribution  (Todorovic  and  Woolhiser,  1971; 
Richardson,  1981)  which  is  a  one-parameter  distribution.     Skees  and 
Shenton  (1974)   and  Mielke  and  Johnson  (1974)   suggested  that  the 
exponential  distribution  has  a  thinner  tail  than  that  observed  in  daily 
rainfall  amounts. 

(2)  The  mixed  exponential  distribution  (Smith  and  Schreiber,  1973; 
Woolhiser  and  Pegram,   1979;   Woolhiser  and  Roldan,    1982)  whose 
coefficient  of  variation  is  always  greater  than  unity,  as  is  usually  the 
case  in  the  daily  rainfall  amounts.    This  distribution  has  the  appealing 
interpretation  of  being  the  superposition  of  two  or  more  exponential 
distributions  produced  by,   say,  different  mechanisms.     The  mixed 
exponential  distribution  was  found  to  be  the  best  of  four  candidates  for 
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daily  rainfall  amounts  in  a  comparison  study  by  Woolhiser  and  Roldan 
(1982).  Everitt  and  Hand  (1981,  Ch.3)  discuss  methods  of  identifying 
and  fitting  mixed  distributions. 

(3)  The  gamma  distribution  which  has  been  used  extensively  (see,  for 
example,  Ison  et  al . ,  1971;  Buishand,  1978;  Carey  and  Haan,  1978). 

(4)  The  Kappa  or  generalized  beta  distribution  introduced  by  Mielke  ' 
(1973)  and  Mielke  and  Johnson  (1974). 

(5)  Empirical  distributions  as,  for  example,  that  used  by  Cole  and 
Sheriff  (1972)  or  other  special  distributions.    For  example,  Katz  (1977) 
used  a  chain-dependent  distribution  assuming  that  the  rainfall  amounts 
are  independent  but  that  the  distribution  function  depends  on  whether 
the  previous  day  was  wet  or  dry.    Buishand  (1978)  distinguished  between 

i 

three  different  types  of  wet  days  (DWD,  DWW,  WWD,  and  WWW  where  D  stands 
for  dry  and  W  for  wet)  and  fitted  different  Gamma  distributions  to  each 
of  the  three  rainfall  amounts.    All  these  distributions,  however,  have 
the  disadvantage  of  too  many  parameters. 

Woolhiser  and  Roldan  (1982)  present  a  comparison  of  several 
distributions  (chain-dependent  and  independent  exponential,  gamma,  and 
mixed  exponential  distributions)  for  five  U.S.   stations  in  Kansas, 
Missouri,  Florida,  Wyoming,  and  Indiana.    Using  the  Akaike  Information 
Criterion  these  distributions  ranked  from  best  to  worst  as  mixed 
exponential,  independent  gamma,  chain-dependent  gamma,  and  exponential. 
It  should  be  noted  that  in  the  above  study  the  degree  of  dependence  of 
rainfall  amounts  in  consecutive  days  was  not  tested  and  independence  was 
assumed. 

If  a  dependence  structure  is  present,  then  more  complicated  models 
have  to  be  used.     Commonly  used  time-series  models,  such  as  those 


21 


described  by  Box  and  Jenkins  (1976),  are  not  appropriate  because 
rainfall  amounts  are  bounded  from  below  by  zero,  and  are  therefore 
positively  skewed.     The  class  of  Exponential-ARMA  (EARMA),  Gamma-AR 
(GAR),    or    standard    ARMA   models    together   with  normalization 
transformations  might  be  considered,  however.    ARMA  models  with  skewed 
marginal  pdf's  (especially  exponential  and  Gamma)  have  been  extensively 
studied  by  Lawrance  and  Lewis  (1980),  Lawrance  (1980),  Gaver  and  Lewis 
(1980),  Jacobs  and  Lewis  (1977),  Lawrance  and  Lewis  (1977),  and  Lewis 
(1978)  and  have  been  applied  to  hydrology  by  Obeysekera  and  Sal  as  (1983) 
for  streamflow  modeling.    Raudkivi  and  Lawgun  (1972)  have  proposed  a 
scheme  for  modeling  serially  correlated  data  with  skewness  described  by 
a  Pearson  type  3  distribution.     Although  they  have  applied  their 
technique  to  rainfall  durations  (defined  as  the  length  of  non-zero 
10-minute  interval  rainfall  depths),  the  potential  use  in  modeling  daily 
rainfall  amounts  seems  straightforward. 


CHAPTER  3 

CONTINUOUS  VERSUS  DISCRETE  POINT  PROCESS  MODELS 
FOR  DAILY  RAINFALL  OCCURRENCES 


The  Second,  feeling  of  the  tusk, 
Cried,  "Ho!  what  have  we  here 

So  very  round  and  smooth  and  sharp? 
To  me  'tis  mighty  clear 

This  wonder  of  an  Elephant 
Is  very  1  ike  a  spear! " 

When  daily  rainfall  occurrences  are  modeled  as  a  continuous  point 
process,  it  is  implied  that  events  can  occur  anywhere  on  the  time 
axis,  i.e.,  that  multiple  occurrences  during  a  day  are  possible.  The 
only  information,  however,  that  is  contained  in  the  daily  rainfall 
occurrence  data  is  whether  a  day  is  dry  or  wet,  i.e.,  whether  or  not 
at  least  one  event  has  occurred  during    a  day,  and  not  the  number  of 
events.    With  the  continuous-point-process  interpretation  of  the  daily 
rainfall    occurrences,   one  faces  the  problem  of   inferring  the 
properties  of  the  (unobservable)  continuous  counting  process  from  the 
discrete  sampled  data.     Bril linger  (1978)  and  Guttorp  and  Thompson 
(1983)  have  studied  this  problem  and  have  proposed  methods  of 
estimating  the  parameters  of  a  continuous  point  process,  as  well  as 
approximately  reconstructing  the   locations  of  events,   from  the 
discrete  sampled  data.     Such  methods,  however,  are  applicable  only 
when  the  sampled  counting  process  provides  at  least  the  information  of 
the  number  of  events  during  the  sampling  intervals,  as  for  example  in 
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the  process  of  daily  traffic  fatalities  where  the  number  of  fatalities 
during  a  day  are  recorded  but  not  the  exact  times  of  occurrence. 

Daily  rainfall  occurrence  data,  on  the  other  hand,  do  not  contain 
information  about  the  number  of  "events"  in  a  day,  and  therefore  their 
interpretation  as  a  continuous  point  process  is  complicated.  For 
example,  if  a  continuous  point  process  model  is  used  for  generation  of 
(synthetic)  daily  rainfall  occurrences,  the  most  natural  approach  to 
discretizing  the  continuous  sequence  is  to  lump  all  the  occurrences 
during  a  one  day  interval  to  only  one  point,  say,  at  the  end  of  that 
day.    The  result  of  such  an  operation  is  a  discretized  point  process 
with  a  lower  rate  of  occurrence  and  altered  statistical  properties. 
The  greater  the  rate  of  occurrence,  i.e.,  the  more  frequent  the 
events,  the  more  serious  the  discretization  effect  will  be.    For  rates 
corresponding  to  daily  rainfall  occurrences  (x  =  0.5  to  0.2  days"''', 
for  mean  interarrival  times  of  2  to  5  days),  these  effects  are  fairly 
significant,  in  contrast  with  rates  corresponding  to,  say,  hourly 
rainfall  events  or  occurrence  of  wet  periods,  i.e.,  interrupted 
sequences  of  rainy  hours  or  days.    In  general,  the  use  of  continuous 
point  process  models  for  discrete  observation  sequences  will  present 
major  problems  when  the  observation  sequences  represent  cumulative 
rainfall  amounts  over  a  period  which  is  on  the  order  of  the  process 
interarrival  time. 

How  much  such  a  discretization  scheme  affects  a  Poisson  process 
with  rate  of  occurrence  \  can  easily  be  shown  analytically.  For 
example,  it  can  be  shown  that  in  order  to  obtain  a  discrete  point 
process  with  rate  of  occurrence  X,  a  continuous  Poisson  process  with 
rate  of  occurrence  equal  to  -ln(l  -  x)  >  X  is  required.  Similar 
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results  for  non-Poisson  processes  such  as  Poisson  cluster  processes 
are  not  easily  obtained  in  closed  form  and  are  better  studied  via 
simulation. 

Modeling  daily  rainfall  occurrences  by  using  continuous  models 
with   adjusted   parameters   to   compensate   for   the   effects  of 
discretization  are  awkward  at  best  and  generally  impractical.  The 
natural  approach  to  modeling  the  daily  rainfall  occurrences  is  to  use 
only  the  information  provided  by  the  data,  i.e.,  to  view  the  rainy 
days  as  constituting  all  of  the  events  of  the  process,  and  to  generate 
rainy  or  dry  days  rather  than  continuous  events.     Some  of  the 
implications  of  this  viewpoint  are  considered  in  this  chapter.  First, 
however,  some  definitions  and  general  properties  of  a  continuous  point 
process,  necessary  for  the  development  of  the  rest  of  this  work,  are 
given. 

3.1    Statistical  Background  on  Stationary  Point  Processes 
Let  an  event     occur  at  times  t^ ,  t2,  t^,...,  and  let 
=  t^  -  t^_^  (r  =  1,2,3,...)  be  continuous  random  variables 
identically  distributed  with  common  pdf  f(x).     The  variable  X  is 
called  the  interarrival  time,  or  time  between  events,  or  simply 
interval.    A  point  process  is  stationary  if  the  joint  distribution  of 
the  number  of  events  in  a  set  of  k  fixed  intervals,  for  all  k  = 
1,2,3,...,  is  invariant  under  translation  (Cox  and  Lewis,  1978).  Two 
immediate  consequences  of  stationarity  are  that  (1)  the  distibution  of 
the  number  of  events  in  an  interval  depends  only  on  the  length  of  the 
interval,  and  (2)  there  is  no  trend  in  the  mean  rate  of  occurrence  of 
events,  i.e.,  the  expected  number  of  events  in  an  interval  is 
proportional  to  the  length  of  that  interval. 
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Depending  on  whether  or  not  an  event  occurs  at  time  tQ,  the 
process  starts  with  an  arbitrary  event  or  at  an  arbitrary  time 
(synchronous   and   asynchronous   sampling,    respectively,    in  the 
terminology  of  Lawrance  (1972)).     For  the  rainfall  occurrences,  we 
have  considered  the  process  as  starting  at  an  arbitrary  event  but  not 
including  it.    This  implies  that  the  pdf  of  the  time  to  the  first 
event  is  the  same  as  the  pdf  of  all  the  other  subsequent  interarrival 
times. 

Continuous  point  processes  have  been  extensively  studied  in  the 
statistical  literature  (see  for  example.  Cox  and  Lewis,  1978;  Cox  and 
Isham,  1980;  finlar,  1975;  Lewis,  1972;  Srinivasan,  1974).    They  have 
found  extensive  applications:  in  queueing  theory  (Khintchine,  1960); 
modeling  times   to  computer  failure   (Lewis,   1964);  earthquake 
occurrences  (Vere-Jones,  1970);  traffic  data  (Bartlett,  1963);  spatial 
distibution  of  galaxies  (Neyman  and  Scott,  1958);  and  rainfall 
occurrences  (LeCam,  1961;  Kavvas  and  Delleur,  1975;  Waymire  and  Gupta, 
1981;  and  Smith  and  Karr,  1983).    Additional  applications  can  be  found 
in  a  series  of  papers  edited  by  Lewis  (1972). 

In  studying  a  series  of  events  (point  process),  two  properties 
are  of  interest:  the  interval  properties  dealing  with  the  times 
between  events,  and  the  counting  properties  dealing  with  the  number  of 
events  in  time  periods  of  specified  length.     The  second  order 
properties  of  intervals  and  counts  which  will  be  used  in  this  work  are 
introduced  below. 

Interval  properties.    Let  {X.}  be  the  series  of  interarrival 
times.    We  denote  the  mean,  variance,  and  coefficient  of  variation  of 
X  by  E(X),  Var(X),  and  c^,  respectively.     Standard  time  series 
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analysis,  using  the  autocorrelation  function  and  the  power  spectrum  of 
the  series,  can  be  applied  to  test  the  presence  of  autocorrelation. 
Independent  interarrival   times   (inferred  by  an  autocorrelation 
function  not  significantly  different  from  zero,  or  a  constant  power 
spectrum)  indicate  that  the  process  is  a  renewal  point  process. 
Special  care  with  the  properties  of  the  autocorrelation  coefficients 
is  needed,  however,  due  to  the  non-normality  (high  skewness)  of 
interarrival  times  (Lewis,  1972).    For  example,  Moran  (1970)  and  Cox 
(1966)  have  shown  that  the  variance  of  the  first  autocorrelation 
coefficient  tends  to  be  smaller  for  random  variables  with  long  tails 
than  for  variables  with  a  normal  distribution. 

The  departure  of  the  coefficient  of  variation,  c^,  from  the  value 
of  one  for  the  exponential  distribution,  is  used  as  a  rough  measure  of 
the  departure  of  the  process  from  the  Poisson  process  (Cox  and  Lewis, 
1978).    A  value  of  c^  >  1  indicates  overdispersion  relative  to  the 
Poisson  process  ("random"  clustering),  and  a  value  of  c^  <  1  indicates 
underdispersion  ("regular"  clustering). 

Let  F(x)  =  P(X  <^  x)  be  the  cumulative  probability  distribution  of 
the  interarrival  times.    Then,  the  probability  of  exceedence 

R(x)  =  P(X  >  x)  =  1  -  F(x)  (3.1) 

is  called  the  survivor  function,  and  its  logarithm  is  the  log-survivor 
function.     It  can  be  easily  checked  from  (3.1)  and  the  pdf  of  an 
exponential  distribution  that  the  log-survivor  function  of  a  Poisson 
process  with  rate  of  occurrence  X  is  a  straight  line  with  slope  equal 
to  -  X.    In  analyzing  a  series  of  events,  deviations  of  the  empirical 
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log-survivor  function  from  a  straight  line  indicate  deviation  of  the 
process  from  the  Poisson.     In  particular,  for  a  renewal  process,  a 
convex  log-survivor  function  implies  a  coefficient  of  variation  less 
than  one,  while  the  opposite  holds  for  a  concave  log-survivor  function 
(Watson  and  Wells,  1961).     Such  relationships  are  helpful  in 
determining  the  marginal  distribution  of  the  interarrival  times  and  in 
identifying  possible  models  for  the  process. 

Counts  properties.    Let  {N^}  denote  the  counting  process  of  an 
asynchronous  point  process  (i.e.,  a  process  which  starts  at  an 
arbitrary  time),  and    {N^.' }    denote  the  counting  process  of  a 
synchronous  point  process  (i.e.,  a  process  which  starts  with  an 
arbitrary  event).    Notice  that  {N^'},  the  number  of  events  in  (0,t], 
is  the  counting  process  of  a  series  of  events  that  starts  with  an 
event  but  does  not  include  it.    The  obvious  relationship  between  the 
sequence  of  intervals  {X.}  and  the  counting  process  {N^'}is 


P(N^'  <  r 


)  =  P(x,  +  X,  + 


+  x^  >  t),  r  =  1,2 


, . . . 


(3.2) 


(Cox  and  Lewis,  1978). 


The  following  counting  properties  are  of  interest: 


(1)  The  mean  value  function,  M(t),  defined  as 


M(t)  =  E(N^). 


(3.3) 


For  any  stationary  process,  M(t)  =  t/E(X)  =  mt,  where  m  =  1/E(X)  is 
the  intensity  or  rate  of  occurrence  of  the  process. 
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(2)  The  renewal  function,  H{t),  defined  as 


H(t)  =  E(N^').  (3.4) 

For  large  t,  H(t)-»M(t).    For  a  Poisson  process  H(t)  =  mt. 

(3)  The  renewal  density,  or  conditional   intensity  function,  h(t), 

defined  as 


E(N  '    ,^.J  dH(t) 
h(t)  =  lim   L-i_I±At_  =   (3.5) 

At-K)         At  dt 


(Cox  and  Lewis,  1978).    Notice  that  h(t)  is  not  a  pdf,  but  instead 
h(t)At  is  the  probability  of  having  an  event  in  a  small  interval  At 
near  t.     Since  multiple  events  are  not  permitted  (this  is  the 
so-called  orderliness  requirement;  see  Daley  and  Vere-Jones,  1972), 
the  probability  of  more  than  one  event  in  an  interval  of  length  At  is 
0(At  ) ,  and  therefore: 


P( event  in  (t^+t,  t^i+t+At)  /  event  at  t^) 
h(t)  =  lim  ^  y  ^,  (3.6) 

At^O  At 

where  the  event  at       is  an  arbitrary  event  in  the  stationary  process. 
The  renewal  density  of  a  Poisson  process  is  constant  and  equal  to  the 
intensity  of  the  process,  m. 
(4)  The  variance  time  curve,  V(t),  defined  as 


V(t)  =  Var(N^). 


(3.7) 
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For  a  Poisson  process,  {N^}  has  a  Poisson  distribution  for  which  the 
variance  is  equal  to  the  mean,  and  therefore  V(t)=  mt.    Procedures  for 
estimating  the  empirical  variance  time  curve  are  given  in  Cox  and 
Lewis  (1978). 

(5)  The  index  of  dispersion  function,  I(t),  defined  as 


V(t)     V(t)  E(X) 

I(t)  =           =  ,  (3.8) 

M(t)  t 


which  has  the  constant  value  of  one  for  the  Poisson  process.  An 
empirical  I(t)  <  1  for  all  t  implies  underdispersion  relative  to 
Poisson,  and  an  I(t)  >  1  for  all  t  implies  overdispersion  (analogously 
to  the  coefficient  of  variation  of  the  interarrival  times  relative  to 
one,  the  value  for  the  exponential  distribution). 
(6)  The  covariance  density,    t^(t),  defined  as 


yJt)  =  lim  t^_t±A^  (3^9) 

At^  (At)"^ 

C0V(AN  AN  ) 

=  lim  — ^  

At->0  (At)^ 


which  can  be  interpreted  as  the  autocovariance  function  of  the 

differential  process  AN.  =  lim  N^  .       =    lim(N^.^..  -  N.).  The 

^     At-0  ^'^^^^      At-0  ^^^^  ^ 

differential  process,   {aN^},  can  be  thought  of  as  an  instantaneous 

process  having  zeroes  at  all   points  except  for  spikes  (delta 

functions)  at  the  points  of  occurrence  of  events.    The  covariance 

density,     Y+('r),  is  a  measure  of  the  likelihood  of  two  events 
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occurring  t  units  apart  (Cox  and  Lewis,  1978,  ch.4).  For  a  Poisson 
process  with  intensity  m, 

m    ,    for  T  =  0 

Y  +  (t)  =  (3.10) 
0    ,  otherwise. 

(7)  The  spectrum  of  counts,  g^(oj),  which  is  the  Fourier  transform  of 
the  covariance  density 


g+(u)) 


Y+(T)e  dx, 


(3.11) 


The  spectrum  of  counts  is  a  useful  tool  in  the  statistical  analysis  of 
series  of  events  and  is  preferable  to  other  functions  due  to  its 
superior  sampling  properties  (Bartlett,  1963).    For  a  Poisson  process 
the  spectrum  of  counts  has  a  constant  value  equal  to  m/ir. 

3.2.    Poisson  Versus  Bernoulli  Processes 
In  this  section,  the  properties  of  the  Bernoulli  process,  which 
is  the  discrete  analogue  of  the  Poisson  process,  are  studied  and 
compared  with  those  of  the  Poisson  process.    This  comparison  reveals 
that  if  indeed  the  discrete  daily  rainfall   occurrences  were  an 
independent  process,  i.e.,  a  Bernoulli  process,  if  modeled  as  a 
continuous  point  process  they  would  be  interpreted  as  underdispersed 
relative  to  the  (continuous)  Poisson  process.    On  the  other  hand, 
selected  daily  rainfall  structures  underdispersed  relative  to  the 
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Poisson  process  are,  in  fact,  all  shown  to  be  overdispersed  relative 
to  Bernoul 1 i . 

3.2.1.  Statistical  Properties  of  the  Bernoulli  Process 

Consider  a  sequence  of  independent  repeated  trials  with  two 

possible  outcomes,  success  and  failure.    Let  p  denote  the  probability 

of  success  at  each  trial  and  N    denote  the  number  of  successes  in  r 

r 

trials.    Then,      has  a  binomial  probability  distribution 

P(N^  =  k)  =  {[)  p'^(l-p)'''''       .  k  =  0,1.2.....  (3.12) 

and  the  number  of  trials  between  the  n'th  and  (n+l)st  success,  X^,  has 
a  geometric  distribution 

P(X„  =  k)  =  pd-p)""'^       ,  k  =  1,2....,  (3.13) 

for  all  n.    In  the  discrete-time  point  process  terminology,  a  success 
corresponds  to  the  occurrence  of  an  event  (i.e.,  a  rainy  day);  to 
the  counting  process,  that  is  the  number  of  events  in  (0,r];  and  to 
the  time  between  events. 

The  Bernoulli  process  is  the  discrete  analogue  of  the  Poisson 
process  in  the  sense  that  it  is  characterized  by  independent  intervals 
and  independent  counting  increments  and  is  discrete  in  time.  This 
lack  of  memory  property  is  the  result  of  the  geometric  distribution 
for  the  times  between  events,  analogously  to  the  exponential  for  the 
Poisson  (see  Feller,  1968,  p. 329  for  a  proof). 

The  statistical  properties  (i.e.,  mean,  variance,  and  higher 
moments)  of  the  geometric  and  binomial  distributions  are  well  known 


32 


(see  for  example,  Parzen,  1962).     For  this  work,  some  additional 
properties  of  the  Bernoulli  counting  process  are  of  interest,  such  as 
the  spectrum  of  counts,  and  this  derivation  is  given  below.    Since  it 
is  more  natural  to  discuss  the  daily  rainfall  occurrences  with  respect 
to  time  instead  of  trials,  the  familiar  terminology  of  continuous 
point  processes  has  been  retained,  with  the  understanding  that  time  t 
in  a  Bernoulli  process,  or  in  a  general  discrete  point  process, 
corresponds  to  t  discrete  time  units  (i.e.,  t  days). 

Let  f(x)  be  a  probability  density  function  (pdf)  defined  as  the 
continuous  representation  of  the  geometric  probability  mass  function 
(pmf)  of    (3.13).  Then, 

f(x)  =    Z    pd-p)*^'^  6(x-k),  (3.14) 
k=l 

where  6(-)  is  the  Dirac  delta  function.  Let  *f(s)  denote  the  Laplace 
transform  of  f(x)  defined  as 

CO 

*f(s)  =  f  e'^^f(x)dx. 
0 

The  symbol  *f(s)  is  used  to  indicate  the  Laplace  transform  of  a 
generalized  function  of  the  form  (3.14)  from  the  Laplace  transform 
f*(s)  of  a  standard  continuous  function  f(x).    Notice  that  *f(s)  is  an 
exponential  function  of  s,  since  the  Laplace  transform  of  6(x-k)  is 
o^(6(x-k))  =  e"^*^.    It  is  easily  shown  that  the  Laplace  transform  of 
f(x)  of  (3.14)  is 
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*f(s)  =   r    .  (3.15) 

1  -  (I-P)e-' 

Making  use  of  a  standard  result  of  the  renewal  theory  (see  for 
example.  Cox  and  Lewis,  1978,  Ch.4),  the  spectrum  of  counts  of  a 
stationary  renewal  point  process  is  given  as 


p  *f(i  Lo)  *f(-iaj) 

g  (oj)  =         [1  +  +  ].  (3.16) 

IT  l-*f(iw)  l-*f(-ioj) 


Substitution  of  (3.15)  into  (3.16)  yields 


p(l-p) 

g^(aj)  =   .  (3^17) 

IT 

The  statistical  properties  of  interest  for  a  Bernoulli  process, 
with  a  probability  of  success  p,  are  given  in  Table  3.1.  In  the  same 
table,  the  corresponding  properties  of  a  Poisson  process  with  rate  of 
occurrence  X  are  also  given. 

It  is  convenient  to  notice  here  that  *f(s)  =   \p{-s) ,  where  '^{') 
is  the  moment  generating  function  of  the  probability  law  of  x  (eq. 
3.14),  and  is  defined  as  ^Piz)  =  E[e^^],  i.e.,  as  the  expectation  of 
the  exponential  function  e^'^  (see,  for  example,  Parzen,  1960,  p. 215). 
The  equivalence  of  these  terms  will  be  used  in  Chapter  5. 
3.2.2    Comparison  of  a  Poisson  and  a  Bernoulli  Process 

Consider  a  sequence  of  daily  rainfall  occurrences  with  mean 
interarrival  time  x.    If  a  Bernoulli  process  were  fit  to  the  series, 
the  estimate  of  its  probability  of  success,  p,  would  be  p  =  1/x. 
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Table  3.1    Companson  of  Poisson  and  Bernoulli  Processes 


Interarrival  times:  {X^} 


Number  of  events:  {N^} 


Conaitional  intensity 
function 

Log  survivor  function 

Variance  time  curve 

Index  of  dispersion 
function 

Spectrum  of  counts 

Normalized  spectrum 
of  counts 


Poi sson 


X  =  rate  of  occurrence 
f(x)  =  Xe'^^  .  X  >  0 


E(X)  =  1/X 
Var(X)  = 


c  =  r 

V 


=  s  =  2" 


p(N^=k) 


(Xt)^e-^^ 


E(N^)  =  Xt 
Var(Nj.)  =  Xt 


h(t)  =  X 
in[R(x)]  =  -XX 
V(t)  =  Xt 

I(t)  =  1,  Vt 

g+(w)  =  x/w  ,  J  >_  0 

g;(uj)  =  1  ,  CO  >  0 


Bernoul 1 i 
p  =  prob.  of  success 
p(x)  =  p(l-p)''"^  ,  0  <  p  <  1 
E(X)  =  1/p 
Var{X)  =  (l-p)/p^ 
c^  =  /iT?  <  1 
2-p 

c.  =    >  2 

p(N^=k)=(J^)p''(l-p)*-^ 

t  =  discrete  time 
E(N^)  =  pt 
Var(N^)  =  p(l-p)t 

h(t)  =  p 

in[R(x)]  =    in(l-p)  X 
V(t)  =  p{l-p)t 

I(t)  =  1-p  <  1,  Vt 
g+(u;)  =  Pd-pj/if  ,^>.  0 

g;(^)  =  1-p  <  1  ,  .  >  0 


c^  =  coefficient  of  variation 
=  skewness  coefficient 
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Similarly,  if  a  Poisson  process  were  fit,  the  estimate  of  the  rate  of 
occurrence,  X,  would  be  X  =  1/x.    Thus  p  =  X.    Notice,  however,  from 
Table  3.1  how  different  the  other  properties  of  the  two  processes  are. 
In  particular,  the  Bernoulli  process  has  a  coefficient  of  variation  of 
intervals  and  an  index  of  dispersion  function  of  the  counts  always 
less  than  one,  which  imply  underdispersion  relative  to  Poisson.  This 
means  that  inferences  about  over-  and  under-dispersion  of  the  daily 
rainfall  occurrences  would  be  different  depending  on  whether  the 
empirical  functions  of  the  process  were  compared  to  those  of  a  Poisson 
or  to  those  of  a  Bernoulli  process.     It  seems  only  natural  that  a 
discrete  point  process  model,  such  as  daily  rainfall  occurrences, 
should  be  compared  with  the  discrete  independent  Bernoulli  process  and 
not  with  the  continuous  Poisson  process.     This  is  an  important 
observation  and  has  immediate  consequences  in  the  interpretation  of 
the  statistical  functions  of  the  daily  rainfall  occurrence  process. 
In  the  next  section,  the  effects  of  using  a  continuous  point  process 
model  for  the  generation  of  a  discrete  sequence  will  be  studied, 
analytically  for  a  Poisson  model  and  via  simulation  for  a  Neyman-Scott 
model . 

3.3    Effects  of  Discretization  on  a  Continuous  Point  Process 
When  a  continuous  point  process  is  used  for  generation  of 
(synthetic)  daily  rainfall  occurrences,  the  most  natural  approach  to 
discretizing  a  continuous  synthetic  sequence  is  to  lump  all  the 
occurrences  during  a  day  at  one  point,  such  as,  the  end  of  that  day. 
The  resulting  discrete  point  process  has  different  statistical 
properties  than  the  continuous  one.    How  much  these  two  structures 
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differ  will  be  illustrated  below,  first  for  a  Poisson  process  and  then 
for  a  Neyman-Scott  process. 

Let  F(x)  denote  the  cumulative  distribution  function  of  the 
exponential   pdf  of  the  intervals  of  a  Poisson  process.  The 
discretization  scheme  suggested  above  is  equivalent  to  replacing  the 
continuous  exponential    distribution  of  the   intervals  with  a 
discretized  one,  so  that 


P^{x  =  1)  =  Pjx  <  1)  =  F(l)  =  1  -  e"\ 

PjCx  =  2)  =  P^(l  <  X  <  2)  =  F(2)  -  F(l)  =  e"^l  -  e"^). 

(3.18) 


where  P^  and  P^  denote  probabilities  of  a  discrete  variable  and  a 
continuous  variable,  respectively.    Notice  that  the  resulting  discrete 
distribution,  P^(x),  is  geometric  with  parameter: 


implying  that  the  discretized  process  is  a  Bernoulli  process  with  a 
probability  of  occurrence  (or  rate  of  occurrence)  equal  to  y,  a  value 
always  less  than  x.     All  the  other  properties  of  the  discretized 
process  can  be  obtained  by  substituting  the  value  of  y  for  p  in  the 
right-hand  column  of  Table  3.1. 


k)  =  P^(k-1  <  X  <  k)  =  F(k)  -  F(k-l)  =  e 


-(k-l)x 


(1  -  e"^), 


(3.19) 
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For  data  generated  from  a  Poisson  process,  figures  3.1  and  3.2 
show  the  effects  of  discretization  on  some  commonly  used  counting 
properties,  i.e.,  the  spectrum  of  counts,  log-survivor  function, 
variance  time  curve  and  index  of  dispersion.    A  period  of  observation 
of  1000  time  units  (for  example,  days)  was  used,  since  this  is 
approximately  the  length  of  series  available  in  a  month  by  month 
analysis  of  thirty  years  of  daily  data.    Notice  the  agreement  of  the 
analytical  and  simulation  results;  the  empirical  functions  of  the 
discretized  process  differ  from  those  of  the  Poisson  and  the 
differences  are  larger  for  the  higher  rates  of  occurrence.  Also, 
notice  that  the  discretized  process  is  always  underdispersed  relative 
to  Poisson. 

Another  important  issue  raised  from  Figures  3.1  and  3.2  is  the 
data  requirements  to  obtain  reliable  estimates  of  the  empirical 
functions.    It  can  be  seen  from  Figure  3.1  that,  although  the  length 
of  observation  is  the  same  for  all  cases,  the  empirical  functions  of 
the  continuous  Poisson  are  closer  to  the  theoretical  ones  the  larger 
the  number  of  events  is.    This  implies  that  fewer  years  of  daily 
rainfall  data  during  rainy  seasons  contain  the  same  information  as 
more  years  during  dry  seasons,  and  therefore  caution  must  be  applied 
when   interpreting  the  statistical    properties   of  the  rainfall 
occurrences  during  seasons  with  few  rainy  days. 

Figure  3.3  illustrates  the  effect  of  discretization  on  a 
clustered  Neyman-Scott  process.    (This  process  and  the  meaning  of  its 
parameters  have  been  discussed  in  Chapter  2.)    Although  the  effects  of 
discretization  cannot  be  directly  associated  with  the  parameter  values 
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Figure  3.1    Effects  of  discretization  on  a  Poisson  process 
with  rate  of  occurrence  X  =  0.50. 
(A)  Continuous  process,  (B)  Discretized  process. 
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Figure  3.2    Effects  of  discretization  on  a  Poisson  process  with 
rate  of  occurrence  X  =  0.20. 
(A)  Continuous  process,  (B)  Discretized  process. 
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Figure  3.3    Effects  of  discretization  on  a  Neyman-Scott  process 
with  parameters  hg  =  0.20,  p  =  0.40,  and  p  =  0.70. 
(A)  Continuous  process,  (B)  Discretized  process. 
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as  for  a  discretized  Poisson  process,  it  is  still  apparent  that  the 
effects  are  similar  and  are  greater  the  more  clustered  the  process  is. 

3.4    Implications  on  Modeling  Daily  Rainfall 
Recently,  several  authors  have  had  apparent  success  with  the 
application  of  time-continuous  point  process  models  to  daily  rainfall 
observation  sequences.     In  this  chapter  we  have  shown  that  the 
practice  of  using  continuous  point  process  models  for  discrete 
observation  sequences  can  give  misleading  results  regarding  inferences 
about  over-  and  under-dispersion  of  the  process  and,  therefore, 
incorrect  conclusions  about  the  underlying  rainfall  generating 
mechanism.    Moreover,  continuous  point  process  models  cannot  be  used 
for  generation  of  daily  rainfall  sequences,  which  in  many  cases  may  be 
the  purpose  of  modeling  rainfall  in  the  first  place. 

A  discrete  point  process  modeling  approach  which  uses  the 
Bernoulli  process  (the  discrete  analogue  of  the  Poisson  process)  as 
its  basis  for  comparison  has  been  suggested.     Inferences  about 
clustering  (over-  and  under-dispersion)  in  daily  rainfall  should, 
therefore,  be  made  by  comparing  the  empirical  properties  of  the 
process  to  those  of  the  Bernoulli  and  not  to  those  of  the  Poisson,  as 
has  usually  been  the  practice. 

In  the  next  chapter,  six  daily  rainfall  time  series  from  stations 
throughout  the  U.S.  are  analyzed  to  give  further  insight  into  the 
structure  of  daily  rainfall  occurrence  processes.    On  the  basis  of  the 
preliminary  theoretical  analysis  given  in  this  chapter  and  the  results 
of  the  data  analysis,  the  inappropriateness  of  the  continuous  point 
process  models  for  daily  rainfall  is  conclusively  demonstrated. 


CHAPTER  4 

AN  EXPLORATION  OF  DAILY  RAINFALL  STRUCTURES 


The  Third  approached  the  animal, 
And  happening  to  take 

The  squirming  trunk  within  his  hands, 
Thus  boldly  up  and  spake: 

"I  see,"  quoth  he,  "the  Elephant 
Is  very  like  a  Snake!" 

Previous  studies  on  point-process  modeling  of  daily  rainfall 
occurrences  have  been  confined  to  the  analysis  of  a  single  season 
within  which  the  process  has  been  assumed   homogeneous  (i.e., 
stationary),   and/or  to  the  analysis   of  stations   with  similar 
probabilistic  structures,  i.e.,  stations  from  particular  geographic 
regions.    For  example,  Kavvas  and  Delleur  (1975)  analyzed  seventeen 
daily  rainfall  records,  all  from  Indiana,  and  applied  a  homogenization 
scheme  to  cope  with  trends  and  seasonality  over  the  seven-year  period 
studied.    A  time  varying  function,  x(t),  was  fit  to  the  mean  rate  of 
occurrence: 

7  ^ 

X(t)  =  exp    {a^  +  agt  +  a3t^  +       R.  sin(w.t  +  9-)}  .  (4.1) 

Under  the  Poisson  hypothesis,  the  original  time  increments.  At,  were 
rescaled  to  at  =  x(t)At,  where  at  is  referred  to  as  the  intrinsic  time 
scale.    In  eq.  (4.1),     ,  o.^^  and      are  parameters  to  model  the  long 
term  trends;  R.,    . ,  and  9^.  are  respectively  the  amplitude,  frequency 
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and  phase  angle  of  the  i'th  significant  periodicity  (Kavvas  and 
Delleur,  1975).    It  should  be  noted,  however,  that  this  homogenization 
scheme  removes  the  nonstationarity  only  from  the  first  moment  for  a 
non-Poisson  process,  such  as  daily  rainfall,  and  not  from  higher 
moments.    In  addition,  long  term  trends  and  periodicities  identified 
from  seven  years  of  data  cannot  reasonably  be  extrapolated,  and 
therefore  the  model  is  limited  to  analysis,  rather  than  generation  of 
synthetic  sequences.    Smith  and  Karr  (1983)  analyzed  the  summer  season 
(July  to  October)  rainfall  occurrences  for  seven  stations  in  the 
Potomac  river  basin.    Twenty-seven  years  of  daily  rainfall  occurrences 
for  Denver,  Colorado,  were  analyzed  by  Ramirez-Rodriguez  and  Bras 
(1982)  for  the  period  May  15  to  September  11,  and  by  Rodriguez-Iturbe 
et.  al.  (1984)  for  the  period  May  15  to  June  16. 

All  of  these  studies  found  that  the  daily  rainfall  occurrence 
process  is  overdispersed  relative  to  the  Poisson  process  (i.e.,  the 
clustering  of  events  is  more  random  than  in  a  Poisson  process)  and 
these  results  have  formed  the  basis  for  applications  of  continuous 
cluster  models,  such  as  the  Neyman-Scott  model,  discussed  in  Chapter 
2.    In  this  chapter,  it  will  be  shown,  using  six  records  of  daily 
precipitation  from  sites  throughout  the  continental  U.S.,  that:  (1) 
the  daily  rainfall  occurrence  process  during  many  seasons  of  several 
sites  is  actually  underdispersed  relative  to  the  Poisson,  and  (2)  more 
importantly,  as  shown  in  Chapter  3,  the  proper  basis  for  comparison  is 
the  (discrete)  Bernoulli  process,  with  respect  to  which  the  rainfall 
occurrence  process  is  overdispersed.    Moreover,  the  analysis  presented 
in  this  chapter  shows  that  the  structure  of  the  daily  precipitation 
has  strong  seasonal  variations;  in  many  cases  the  season-to-season 
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variation  in  model  structure  (as  opposed  to  model  parameters)  is  as 
significant  as  site-to-site  climatic  effects. 

4.1    Selection,  Description,  and  History  of  the  Stations  Analyzed 
Six  U.S.  stations  were  selected  for  the  analysis.    These  stations 
are  located  in  regions  of  different  climatologic  regimes  and  exhibit 
widely  different   rainfall    structures.      Figure  4.1    shows  the 
distributions  of  the  regional  monthly  depths  and  the  location  of  the 
stations.    The  station  locations  are  sufficiently  diverse  to  represent 
the  major  climatic  types  within  the  continental  U.S. 

Additional  information  on  the  stations  is  given  in  Table  4.1. 
The  effect  of  the  time  of  observation  on  the  daily  rainfall  structures 
is  not  thought  to  be  a  serious  problem.     A  different  time  of 
observation  might  have  resulted  in  a  different  daily  rainfall 
sequence,  but  the  rainfall  statistics  (i.e.,  sequence  of  wet  and  dry 
days)  are  not  likely  to  be  much  different  as  long  as  there  is  no 
significant  diurnal  periodicity  in  rainfall.    Of  course,  the  division 
of  a  storm  into  two  when  the  observation  time  falls  within  its 
duration  is  a  problem,  but  this  is  inherent  to  any  discretized 
sequence  of  a  continuous  process. 

Major  changes  in  the  location  of  the  recording  gages,  measurement 
equipment,  time  of  observation  etc.  could  introduce  artificial  trends 
in  the  recorded  sequences.    Therefore,  an  inspection  of  the  history  of 
the  analyzed  stations  was  performed.    All  the  changes  reported  in  the 
National  Oceanic  and  Atmospheric  Administration  (NOAA)  CI imatological 
Data  Publications  for  the  six  stations  of  interest  are  shown  in  Table 
4.2.    Apart  from  a  major  change  of  225  ft.   in  elevation  for  the 
station  at  Roosevelt,  Arizona,   the  other  changes   do  not  seem 


45 


o 
(/) 

c 
le 


01 


1X3 

c: 

o 

(/) 

+-> 

cn 

ro 

c 

c 

ro 

■t-> 

Sh 

> 

>) 

o 

3 

to 

na 

nn 

■o 

(/) 

to 

o 

<u 

O 

0) 

dj 

-a 

+-> 

s_ 

ro 

ro 

<J 

s_ 

to 

T3 

*• 

s- 

(U 

+J 

U. 

cC 

ro 

ro 

o 

O 

X 

<u 

0) 

cu 

o  -c 

'o 

in 

s- 

+-> 

1— 

-(-> 

i 

U- 

(/> 

</) 

<D 

c 

ro 

> 

c: 

■o 

o 

0) 

ro 

O) 

a- 

+J 

'i 

> 

+J 

ro 

o 

o 

(/) 

ro 

•r- 

ro  M- 

o 

3 

+-> 

OO  CC 

Q 

(/I 

x 

S- 

t — 1 

CM  CO 

LT)  to 

to 

>^ 

^ 

(0 

1—  -a 

0) 


46 


c 

o 

■(-> 

■(-> 

^ 

J= 

+j 

CD 

tT> 

•r- 

■p— 

> 

C 

c 

s_ 

cL 

-a 

■a 

■a 

T3 

Oi 

ly) 

e 

Ln 

E 

'i 

E 

E 

O 

1— 

o 

■t-J 

o 

un 

r~ 

CM 

o 

CO 

o 

i-H 

1—1 

CO 

> 

 , 

o 

Lf) 

CSJ 

CM 

LO 

UJ 

<U 

T3 

1 — t 

CTl 

CM 

LO 

CM 

3 

LO 

O 

1 — 1 

,—4 

LO 

•!-> 

•r- 

>— 1 

t— 1 

o 

LO 

^ 

-a 

CT> 

CM 

r— 1 

CTl 

CO 

O 

<u 

.— 1 

.—1 

1— ( 

FN 

Q 

>i 

f— 

tz 

<: 

■o 

(/) 

cn 

o 

00 

CO 

oo 

CO 

c 

•(-> 

cn 

«a- 

.— I 

LO 

o 

•r" 

■(-> 

co 

o 

Lf) 

Ol 

CTl 

4J 

CO 

CM 

ro 

C^ 

lO 

_l 

4-> 

OO 

r_ 

"O 

00 

p«» 

CO 

lO 

(U 

1 — 

4— 

M 

<Ti 

CTl 

C 

Ul 

>5 

1 

1 

1 

1 

1 

1 

(O 

CO 

00 

00 

00 

a: 

Ol 

c 

«=»• 

•id- 

>- 

CTl 

<Ti 

a> 

CTl 

>, 

a 

X 

</> 

(U 

c 

CO 

. — 1 

00 

ro 

CTl 

o 

jC 

o 

CO 

CM 

CO 

CM 

+-> 

CM 

CO 

CM 

o 

LD 

IX) 

CM 

1 

1 

1 

o 

Q 

LO 

CM 

I— ( 

00 

CO 

LO 

00 

»— « 

o 

«:}- 

o 

ro 

o 

c 

o 

c 

CL 

o 

<t 

c 

(O 

1—4 

Ll_ 

0) 

Q. 

Q. 

I— 1 

■t-) 

Q. 

Q. 

< 

• 

e 

< 

'a! 

ITS 

> 

c 

■o 

OJ 

O) 

QJ 

-i-> 

OJ 

a- 

CO 

+-> 

E 

> 

13 

E 

o 

O 

•r- 

+-> 

"3 

O 

x: 

cu 

1— 

OO 

zr 

2 

Q. 

Q 

47 


Table  4.2    History  of  the  Six  Rainfall  Stations  Analyzed 


1.  Snoqualmie  Falls,  Washington  (ID: 

February  1953:    Latitude  from  47°  31'  to  47°  33' 
February  1958:    Elevation  from  430  ft.  to  440  ft. 
April       1967:    Observation  time  from  5pm  to  midnight 

2.  Roosevelt,  Arizona  (ID:  02-7281) 


July        1954:    Observation  time  from  7am  to  Sam 
October    1961:    Elevation  from  2230  ft.  to  2005  ft. 
November  1979:    Observation  time  from  8am  to  7am 


3.  Austin  WSO  Ap. ,  Texas  (ID:  41-0428) 

July        1961:    Elevation  from  615  ft.  to  597  ft. 
January    1970:    Equipment  from  weighing  to  recording 

4.  Miami  WSO  Ap. ,  Florida  (ID:  08-5663) 

June        1958:    Latitude  from  25°  49'  to  25°  48' 
Longitude  from  80°  17'  to  80°  16' 
Elevation  from  8  ft.  to  7  ft. 

May         1977:    Longitude  from  80°  16'  to  80°  18' 
Elevation  from  7  ft.  to  12  ft. 

5.  Philadelphia  WSO  Ap. ,  Pennsylvania  (ID:  36-6889) 

October    1953:    Elevation  from  20  ft.  to  13  ft. 
January    1958:    Longitude  from  75  14  to  75  15 
Elevation  from  13  ft.  to  7  ft. 
May         1965:    Elevation  from  7  ft.  to  5  ft. 
April       1976:    Elevation  from  5  ft.  to  10  ft. 

6.  Denver  WSO  Ap. ,  Colorado  (ID:  05-2220) 

March  1981:  Elevation  from  5298  ft.  to  5292  ft. 
June  1963:  Elevation  from  5292  ft.  to  5283  ft. 
January    1970:    Latitude  from  39    45'  to  39°  46' 

Longitude  from  104°  53'  to  104°  52' 
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significant  enough  to  have  had  major  effects  in  the  measured  rainfall. 
It  should  be  noted  that  none  of  the  six  stations  is  located  in  a  major 
downtown  metropolitan  area  so  the  direct  effects  of  urbanization 
should  not  have  been  significant. 

To  verify  the  stationarity  of  the  records,  a  graphical  trend 
analysis  on  both  the  occurrence  and  amounts  processes  was  conducted. 
Figures  A.7-A.12  of  Appendix  A  show  plots  of  the  total  number  of  rainy 
days  over  a  year  and  the  total  annual  rainfall  amounts  as  functions  of 
the  year  for  all  the  six  stations.    No  significant  trends  in  either 
the  occurrence  or  the  amounts  process  are  apparent  from  this  graphical 
analysis.     Formal  statistical  tests,  such  as  Cramer's  statistic 
(Cramer,  1946)  for  a  trend  in  the  rate  of  occurrence  of  events,  were 
not  applied  since  these  tests  require  a  Poisson  hypothesis  and  their 
performance  is  unknown  when  the  true  process  is  clustered. 

4.2    Statistical  Analysis  of  Daily  Rainfall  Sequences 
Thirty  years  of  daily  rainfall  during  the  period  1948  to  1977 
(1949-1978  for  Miami)  were  analyzed  for  the  six  stations  shown  in 
Table  4.1.    The  statistical  properties  of  the  occurrence  processes 
(i.e.,  dependence  structure  and  first-  and  second-order  properties  of 
the  non-zero  precipitation  sequences)  were  estimated  from  the  daily 
rainfall  data.    In  addition,  the  cross-correlation  functions  of  the 
amounts  with  the  preceding  and  following  interarrival  times  were 
estimated. 

4.2.1    Seasonality  of  Daily  Rainfall  Sequences 

The  daily  rainfall  process  is  a  non-stationary  (periodic)  process 
for  both  the  rate  of  occurrence  of  events  and  the  daily  amounts. 
Therefore,  a     time-varying  model   is  needed  to  accommodate  this 
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non-stationarity.    Apart  from  the  case  of  a  simple  Poisson  model  with 
time-varying  rate  of  occurrence,  the  generalization  of  other  models, 
such  as  Poisson  cluster  models  and  doubly  stochastic  Poisson  models, 
is  in  most  cases  mathematically  intractable  (Cox  and  Lewis,  1978; 
Srinivasan,  1974;  and  others).    Use  of  a  homogenization  scheme,  such 
as  that  of  Kavvas  and  Delleur  (1975),  to  transform  the  data  prior  to 
the  data  analysis  is  rejected  for  two  reasons:     (1)  homogenization 
schemes  are  based  on  the  Poisson  hypothesis  and  therefore  remove  the 
non-stationarity  only  from  the  first  moment,  and  (2)   the  inverse 
transformation  is  not  valid  for  a  non-Poisson  process  and  therefore 
the  model  cannot  be  used  for  generation  purposes.    Hence,  it  seems 
that  the  best  approach  is  to  model  the  daily  rainfall  process  by 
seasons  within  which  the  process  is  assumed  homogeneous.  This 
approach  has  been  followed  herein. 

The  transient  effects  caused  by  crossing  from  one  season  to  the 
next  are  neglected  in  this  formulation.    For  the  formation  of  the 
daily  rainfall  occurrence  series,  a  dry  period  (i.e.,  an  uninterrupted 
sequence  of  dry  days)  was  assigned  to  the  month  or  season  in  which  it 
started,  regardless  of  the  ending  month  or  season.    In  other  words,  if 
the  last  rainy  day  in  July  was  on  July  25,  and  the  next  rainy  day  was 
on  August  10,  a  dry  period  of  16  days  was  assigned  to  the  month  of 
July.    This  is  believed  to  be  the  most  natural  approach  to  handle  the 
transient  effects  from  season  to  season.    Other  workers  (e.g.,  Chang 
et  al.,  1984)  have  used  an  abrupt  transition  between  seasons;  for  the 
above  example,  their  approach  would  have  assigned  a  dry  sequence  of  6 
days  to  July  and  a  dry  sequence  of  10  days  to  August. 
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An  important  issue  in  modeling  the  non-homogeneous  daily  rainfall 
structure  is  the  selection  of  seasons.     One  approach  would  be  to 
consider  each  month  as  a  separate  season.    However,  grouping  the  data 
into  seasons  of  more  than  one  month  each  is  desirable  for  several 
reasons:  (1)  the  sample  size  of  data  for  the  estimation  of  the  model 
parameters  is  increased;  (2)  the  representation  of  the  process  is  more 
parsimonious;  (3)  transient  effects  from  season  to  season  are  reduced; 
and  (4)  computational  effort  is  reduced.    An  alternate  approach  is  to 
separate  the  year  into  seasons  of  equal  length  based  on  preliminary 
statistical  analysis  of  the  average  number  of  events  per  month, 
average  storm  depths,  and  other  summary  statistics.    However,  the 
proper  selection  of  seasons  depends  not  only  on  the  number  of  events 
in  a  given  period,  but  also  on  the  distribution  of  events  within  that 
period.    Second-order  properties  of  counts,  which  provide  information 
about  the  distribution  of  events,  should  therefore  be  used  in  season 
identification  procedures.    A  season  discrimination  methodology,  based 
on  all  the  statistical  properties  of  intervals  and  counts,  will  be 
discussed  and  implemented  in  Chapter  7.    The  first  step,  however,  is 
the  selection  of  a  small  time  period  (i.e.,  a  few  days  or  one  month) 
over  which  the  process  can  be  safely  assumed  homogeneous.  The 
statistical  properties  of  the  process  within  each  of  these  small 
periods  are  then  analyzed  and  compared  so  that  those  periods  with 
similar  statistical  structures  can  be  grouped  together.  Unless 
predictable  climatic  changes  are  known  to  occur  within  a  month,  a 
monthly  period  can  usually  be  assumed  homogeneous.    In  this  work,  a 
month  by  month  statistical  analysis  has  been  carried  out  as  a  first 
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step  for  all  six  stations.    Based  on  the  results  of  this  analysis, 
longer  homogeneous  periods  (seasons)  are  identified  in  Chapter  7. 
4.2.2    Second-Order  Properties  of  Intervals  and  Counts 

The  sequence  of  interarrival  times  (times  between  events)  is  a 
discrete  positive  valued  sequence,  whose  dependence  structure  and 
marginal  pdf  are  to  be  identified.    Table  A.l  gives  the  first  five 
autocorrelation  coefficients  of  the  interarrival  time  sequences  for 
the  six  stations.    An  approximate  test  for  their  significance  results 
from  assuming  that  the  autocorrelation  coefficient,  p.,  has  a 
N(u,a^)  =  N(0,  l/(n-j))  distribution.    Lewis  et  al.  (1969)  comment 
that  this  test  is  applicable  "provided  that  the  marginal  distribution 
of  intervals  is  not  too  highly  skewed  and  that  the  number  of  events  is 
greater  than  100."    Using  this  test,  the  significance  (at  the  5 
percent  and  1  percent  levels)  of  the  autocorrelation  coefficients  has 
been  tested  and  the  results  are  shown  in  Table  A.l.     Only  a  few 
autocorrelation  coefficients  were  significant.    However,  this  test  is 
weak  for  skewed  data  and  not  directly  appropriate  for  discrete  time 
series.     Non-parametric  tests,  such  as  exponential-score  product 
moment  statistics  (Cox  and  Lewis,  1978),  are  particularly  useful  for 
short  and  highly  skewed  series,  but  due  to  the  problem  of  ties  in  the 
series  of  interarrival  times,  they  are  difficult  to  apply. 

Another  way  of  testing  independence  of  discrete  data  could  be  to 
use  tests  for  independence  in  Markov  chains  (Bi 1 1 ingsley,  1961).  For 
example.  Cox  and  Lewis  (1978,  p. 177)  present  a  case  of  a  discrete 
point  process,  where  the  standard  test  on  the  autocorrelation  function 
failed  to  indicate  significant  dependence  in  the  series  of  intervals, 
whereas  significant  dependencies  were  identified  from  a  contingency 
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table  of  conditional  transition  probabilities.    For  the  daily  rainfall 
occurrences,  informal  tests  have  indicated  significant  autocorrelation 
structures.      These   tests    consist   of   comparing  conditional 
probabilities  of  transition  to  intervals  of  lengths  i^,  i=l,2,..., 
from  intervals  of  a  particular  length  i,  p{l^/l),  i.e.,  p(l/l), 
p(2/l),  p(3/l),  etc.    These  conditional  probabilities  should  not  be 
significantly  different  for  an  independent  process;  however,  this  was 
not  the  case  for  the  daily  rainfall  sequences.    Therefore,  in  general 
it  was  concluded  that  the  daily  rainfall  occurrences  at  the  stations 
analyzed  are  not  generated  from  an  independent  Bernoulli  process. 

Table  A. 2  shows  the  mean,  variance,  coefficient  of  variation,  and 
skewness  coefficient  of  the  interarrival  times  for  all  six  stations. 
The  coefficient  of  variation  is  not  always  greater  than  one  (recall 
that  values  less  than  one  imply  a  process  underdispersed  relative  to 
the  Poisson).    In  particular,  the  winter  months  (October  -  February) 
for  Snoqualmie  Falls,  the  summer  months  (May,  June)  for  Roosevelt,  the 
summer  months  (June  -  September)  for  Miami,  and  most  of  the  months 
(January  -  April,  June,  July,  November,  and  December)  for  Philadelphia 
have  a  coefficient  of  variation  less  than  unity.    Therefore,  for  all 
these  months,  the  Poisson  cluster  models  and  the  renewal  Cox  models 
are  precluded  since  both  have  a  coefficient  of  variation  of  intervals 
greater  than  one. 

Figures  A.7-A.12  of  Appendix  A,  show  the  empirical  normalized 
spectrum  of  counts,  log-survivor,  variance  time  curve  and  index  of 
dispersion  functions  on  a  monthly  basis  for  the  six  stations  studied. 
On  the  same  plots,  the  corresponding  functions  for  a  Poisson  process 
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have  been  plotted.    Recall  from  Table  3.1  of  Chapter  3  that  the 
corresponding  functions  for  a  Bernoulli  process  are  as  follows: 

Normalized  spectrum  of 
counts:  Constant  line  of  1  for  Poisson 

(1-m)  for  Bernoul 1 i 

Log  survivor  function:        Straight  line  with  slope  -m  for  Poisson 

In(l-m)  for  Bernoull i 

Variance  time  curve:  Straight  line  with  slope  m  for  Poisson 

m(l-m)  for  Bernoul 1 i 

Index  of  dispersion:  Constant  line  of  1  for  Poisson 

(1-m)  for  Bernoull i 

where  m  is  the  estimated  rate  of  occurrences  of  the  process.    In  view 
of  the  above  and  the  discussion  in  Chapter  3,  interpretations  of  these 
functions,  i.e.,  clustering  (over-  or  under-dispersion)  relative  to 
the  Poisson  and  Bernoulli  processes  is  possible.     Consider,  for 
example,  the  month  of  January  for  Snoqualmie  Falls.    The  theory  of 
continuous  point  processes  would  infer  that  this   process  is 
underdispersed  relative  to  the  Poisson,  implying  that  events  occur 
more  regularly  than  in  a  Poisson  process.    However,  these  functions 
show  that  the  process  is  overdispersed  relative  to  the  Bernoulli  . 
process,  that  is,  rainfall  events  occur  more  randomly  than  in  an 
independent  discrete  point  process.    The  above  example  illustrates  the 
inappropriateness  of  the  continuous  point  process  theory  for  modeling 
the  discrete  daily  rainfall  occurrences. 
4.2.3    Second-Order  Properties  of  the  Rainfall  Amounts 

The  sequence  of  non-zero  rainfall  amounts  is  a  continuous 
positive  time  series  whose  autocorrelation  structure  and  marginal  pdf 
are  to  be  identified.    Table  A. 3  gives  the  first  five  autocorrelation 
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coefficients  of  these  sequences  for  all  twelve  months  for  the  six 
stations  analyzed.    Only  a  few  were  significantly  different  from  zero, 
for  example,  the  first  autocorrelation  coefficient  of  the  winter 
months  (December  through  April)  for  Snoqualmie  Falls.    It  should  be 
noted  that  due  to  the  non-normality  of  these  sequences,  the  standard 
ARMA-type  models  cannot  be  used.     Depending  on  the  marginal  pdf's, 
either  a  normalization  transformation  may  be  applied  on  the  data  and 
standard  ARMA  models  be  used  or  the  exponential  ARMA  (EARMA)  or  Gamma 
AR  (GAR)  models  of  Lawrance  and  Lewis  (1977)  may  be  used  directly. 
More  references  on  the  EARMA  and  GAR  models  have  been  given  in  Chapter 
2. 

Table  A. 4  gives  the  statistics  of  the  storm  depth  sequences.  The 
coefficient  of  variation  is  always  greater  than  one,  and  varies  from 
1.07  to  1.30  for  Snoqualmie  Falls,  1.12  to  1.58  for  Roosevelt,  1.36  to 
1.78  for  Austin,  1.30  to  2.23  for  Miami,  and  1.12  to  1.55  for 
Philadelphia,  and  1.07  to  1.88  for  Denver. 
4.2.4    Cross-Correlational  Properties  of  Intervals  and  Amounts 

The  cross-correlation  coefficients  of  the  event  rainfall  amounts 
with  the  interarrival  times  preceding  and  following  that  event  are 
given  in  Table  A. 4  for  all  six  stations.    Only   Snoqualmie  Falls  has  a 
significant  cross  correlation  between  the  daily  rainfall  amounts  and 
the  immediately  following  interarrival  time  for  the  months  of  July 
through  January.    Except  for  occasional  exceptions,  the  other  stations 
do  not  show  significant  cross  dependence  structure. 

4.3    Discussion  on  the  Second-Order  Properties  of  Intervals  and  Counts 

In  this  section  a  more  detailed  discussion  is  given  of  the 
statistical  properties  shown  in  Figures  A.7-A.12  (i.e.,  spectrum  of 
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counts,  log-survivor  function,  variance  time  curve  and  index  of 
dispersion).    The  last  two  functions  are  straightforward.    The  only 
point  to  be  made  is  that  their  values  at  high  lags  (i.e.,  long 
interval  lengths)  are  of  interest,  since  these  values  will  better 
depict  the  deviations  of  the  process  from  an  independent  (Poisson  or 
Bernoulli)  process.    For  small  time  intervals  (i.e.,  a  few  days)  many 
processes  appear  to  be  independent  (local  independence).    However,  the 
estimation  of  these  functions  should  not  extend  to  more  than  about 
20-25  percent  of  the  length,  Tq,  of  the  observed  series  to  avoid 
excessive  sampling  variability.    Cox  and  Lewis  (1978,  p.  116)  discuss 
several  estimators  for  the  variance  time  curve,  as  well  as  sampling 
properties  of  these  estimators. 

The  log  survivor  function  has  been  plotted  in  discrete  time  to 
illustrate  the  discreteness  of  the  interarrival  times.    For  example, 
for  an  interarrival  time,  x  =  Xq,  multiple  points  (triangles)  are 
shown  on  the  plot  to  illustrate  the  number  of  ties,  i.e.,  number  of 
intervals  of  length  Xq.    To  interpret  the  log-survivor  function,  i.e., 
concavity  or  convexity  and  slope,   only  the   lowermost  points 
(triangles)  at  each  entry  are  needed.     Also,  the  full  length  of 
interarrival  times  has  been  retained  to  illustrate  extreme  situations. 
These  extreme  points,  however,  are  less  reliable  and  should  be  given 
less  weight  when  the  log-survivor  function  is  used  for  model  fitting. 

The  spectrum  of  counts  needs  special  attention  because  of  the 
discreteness  of  the  data.     For  a  continuous  point  process  where, 
theoretically  at  least,  events  can  occur  arbitrarily  close  to  each 
other,  the  spectrum  of  counts  extends  to  infinite  frequencies  lo  >  0. 
For  the  daily  rainfall   occurrences,  however,  events  cannot  occur 
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closer  than  one  day  apart  and  this  introduces  a  cutoff  frequency 
(Nyquist  frequency),       =  ir,  or  equivalently  fj^  =  1/2  days"^.  The 
value  plotted  in  the  abscissa  of  the  spectrum  of  counts  plots  is 
called  the  frequency  factor  and  is  defined  as  j  =  ojT/Zu  where  T  is  the 
total   length  of  observation.     Therefore,  the  frequency  factor 
corresponding  to  the  Nyquist  frequency  is  jj^  =  T/2.     This  is  the 
maximum  value  over  which  the  spectrum  of  counts  should  be  computed. 
Guttorp  and  Thompson  (1983)  discuss  aliasing  of  the  spectrum  of  counts 
estimated  from  discrete  sampled  counting  processes.    They  show  that 
this  can  be  severe,  especially  when  the  spectrum  of  counts  does  not 
decrease  rapidly  with  respect  to  the  sampling  interval.    For  the  daily 
rainfall  occurrences,  the  estimated  spectrum  of  counts  began  rising  at 
high  frequencies,  apparently  due  to  aliasing,  but  the  effects  of 
aliasing  introduced  into  lower  frequencies  cannot  be  easily  assessed. 

Lewis  (1970)  gives  a  useful  discussion  of  the  theory,  computation 
and  application  of  the  spectrum  of  counts.     For  a  discrete  point 
process  he  proposes  a  different  estimator  for  the  spectrum  of  counts. 
This  estimator  is   based   on   the   Fourier   transform  of  the 
autocorrelation  function  of  the  binary  series  of  zeros  and  ones.  The 
reader  is  also  referred  to  Bartlett  (1963)  for  a  lengthy  discussion  of 
the  spectral  analysis  of  point  processes.    Sampling  properties  of  the 
spectral  estimates  are  also  given  in  the  above  papers  and  in  Cox  and 
Lewis  (1978,  p. 126). 

Notice  that  the  normalized  spectra  of  counts  for  most  of  the 
months  decrease  with  increasing  frequency  to  a  value  less  than  one  and 
approximately  equal   to  1-x,  where  X  is  the  estimated  rate  of 
occurrence.    For  the  months  that  have  coefficients  of  variation  less 
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than  one,  the  spectrum  of  counts  is  either  approximately  constant 
(indicating  an  independent  Bernoulli  process)  or  increases  slightly 
for  low  frequencies  and  then  decreases.    Such  spectra  of  counts  are 
usually  consistent  with  variance  time  curves  below  that  of  the  Poisson 
process.     This   indicates  underdispersion  relative  to  Poisson. 
However,  most  of  these  structures  are  overdispersed  relative  to  the 
Bernoulli,  since  the  variance  time  curve  of  the  Bernoulli  process  has 
a  slope  equal  to  x(l-x)  <  X. 

It  should  be  noted  that  inferences  about  clustering  require  only 
the  shape  of  the  spectrum  (decreasing  or  constant,  etc.)  and  not  the 
absolute  values.     In  that  sense,  the  clustering  found  in  daily 
rainfall   occurrences  is  still   valid,   the  only  change  is  that 
clustering  should  be  assessed  relative  to  a  discrete  independent 
Bernoulli  point  process  rather  than  the  continuous  Poisson. 

4.4    Need  for  a  Discrete  Clustered  Point  Process  Model  for  Daily 

Rainfall 

This  chapter,  together  with  Chapter  3,  has  demonstrated  that 
continuous  point  process  models  are  not  appropriate  for  modeling  daily 
rainfall  sequences.     In  addition,  using  the  proposed  discrete-time 
point  process  methodology,  it  has  been  shown  that,  indeed,  the  daily 
rainfall  process  is  a  clustered  overdispersed  process  i.e.,  the 
rainfall  events  tend  to  occur  more  randomly  than  in  an  independent 
arrival  process.    Therefore,  the  need  for  discrete  clustered  point 
process  models  for  daily  rainfall  sequences  has  become  apparent.  The 
development  of  such  a  model  is  the  subject  of  the  next  chapter. 


CHAPTER  5 

DEVELOPMENT  OF  DISCRETE  POINT  PROCESS  MODELS  FOR  THE  DAILY 

RAINFALL  OCCURRENCES 


The  Fourth  reached  out  an  eager  hand. 
And  felt  about  the  knee. 

"What  most  this  wondrous  beast  is  like 
Is  mighty  plain,"  quoth  he: 

"'Tis  clear  enough  the  Elephant 
Is  very  like  a  tree!" 

In  this  chapter,  a  discrete  clustered  point  process  model  is 
defined  and  developed.    The  model  belongs  to  the  general  class  of 
Markov  renewal  models  which  were  introduced  by  Smith  (1955),  and  later 
studied  by  Pyke  (1961  a,b)  and  Cox  (1963).    An  extensive  bibliography 
of  theoretical  developments  and  applications  of  the  Markov  renewal 
models  is  given  by  Teugels  (1976).    In  the  words  of  ginlar  (1975),  a 
Markov  renewal  process  can  be  pictured  as  follows:     "Think  of  a 
particle  which  moves  from  one  state  to  another  with  random  sojourn 
times  in  between;  the  successive  states  visited  form  a  Markov  chain, 
and  a  sojourn  time  has  a  distribution  which  depends  on  the  state  being 
visited  as  well  as  the  next  state  to  be  entered"  (^inlar,  1975,  p. 
313).    In  the  most  general  Markov  renewal  process  with  k  states,  it  is 
assumed  that  there  are  k    different  type  of  intervals  (sojourn  times), 
independently  distributed  with  probability  distributions  ^^^(x) 
(i,j  =  1,         k) ,  which  are  sampled  in  accordance  with  a  Markov  chain 
with  transition  probability  matrix  P^.    Thus,  if  the  Markov  chain  has 
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made  a  transition  to  state  i  and  the  next  transition  is  to  state  j,  an 
event  of  probability  p..,  then  the  time  between  these  transitions  has 

^  J 

probability  distribution  f^.j(x). 

Markov  renewal  theory  combines  elements  of  Markov  chain  theory 

and  renewal  theory  to  give  more  general  non-Markovian,  non-renewal 

processes.     It  will   be  seen  later  that  Markov  chains,  Markov 

processes,  renewal  processes  and  alternating  renewal  processes  are  all 

special  cases  of  a  general  Markov  renewal  process.    It  should  be  made 

clear  that  the  times  between  events  are  not  independent  as  the  term 

renewal  implies,  but  instead  are  conditionally  independent.  This 

conditional  independence  gives  a  limited  Markov  property  to  the 

process,  in  the  sense  that  the  future  of  the  process  is  independent  of 

the  past  given  the  present  state,  provided  that  the  present  is  an 

occurrence  of  an  event.    The  above  statement  is  an  informal  definition 

of  the  Markov  renewal  process.    A  formal  definition  follows: 

DEFINITION:    For  each  neN,  let  a  random  variable  S    take  values  in  a 

n 

countable  set  of  states  E  =  (1,2,  ...}  ,  and  a  random  variable  T  take 

n 

values  in        =  [0,+»)  such  that  0  =  Tq  _<  T^  _<  T^  <.  ...     .  The 
stochastic  process  (S,T)  =  {S^,  T^  ;  n    N}  is  said  to  be  a  Markov 
renewal  process  with  state  space  E  provided  that 


P{S 


n+1 


=  J  , 


T 


n+1 


S 


•  •  • » 


S 


n 


T, 


. . . , 


=  P{S 


n+1 


=  j  ,  T 


n+1 


-  T„  <  t  I  S„} 
n  n 


for  all  ne  N  ,  jeE  ,  and  te      (^inlar,  1975,  p.  313). 

Many  authors  (for  example.  Cox  and  Lewis,  1978;  Cox  and  Isham, 
1980)  refer  to  the  Markov  renewal  processes  as  semi-Markov  processes. 
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while  others  make  a  distinction  between  the  two  terms  and  reserve  the 
term  semi-Markov  for  the  state  of  the  process  as  a  function  of  time 
(see  for  example  (^inlar,  1975  ,  p.  316).     In  this  work,  the  terms 
Markov  renewal  and  semi -Markov  refer  to  the  same  process  (defined 
above)  and  are  used  interchangeably,  with  preference  on  the  term 
semi-Markov. 

We  will  consider  here  only  the  case  where  there  are  two  types  of 
intervals  i.e.,  a  two-state  semi-Markov  process.    In  the  next  section 
the  definition  and  statistical  properties  of  a  general  two-state 
semi-Markov  process  are  presented.     In  section  5.2,  a  specific 
discrete  semi-Markov  model   for  the  daily  rainfall   occurrences  is 
defined  and  its  statistical  properties  derived. 

5.1    Statistical  Properties  of  a  General  Two-State  Semi-Markov  Model 

In  a  two-state  semi-Markov  model  it  is  assumed  that  there  are  two 
types   of   intervals   sampled   from  two   different  probability 
distributions,  f^(x)  and  f2(x),  according  to  a  probability  transition 
matrix: 


a,  1-a, 

P  =(  M  ,    0  <  a,,  a„  <  1.  (5.1) 

l-a^  a^ 


In  equation  (5.1), 

a^  =  prob(type  1  interval  — ^type  1  interval), 
a^  =  prob(type  2  interval  — *- type  2  interval), 

or  alternatively,  given  that  the  interval  x^._^  has  the  pdf  fj^(x),  the 
probability  that  x.  has  the  pdf  f2(x)  is  l-a^  etc.    Notice  that  if 
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all  the  intervals  take  the  same  constant  value  with  probability  one, 
then  a  one-state  semi-Markov  model  reduces  to  a  Markov  chain.  In 
other  words,  the  semi-Markov  process  can  be  viewed  as  a  generalization 
of  a  Markov  chain  process  in  which  the  time  spent  in  a  particular 
state  between  transitions  is  no  longer  geometrically  distributed. 

Let  the  row  vector  R^^"^  =  (p^^'^\p2^"h  denote  the  probability 
that  the  n'th  interval  will  be  of  type  1  or  type  2  when  the  initial 
probability  of  the  first  interval  being  of  type  1  or  type  2  is  given 
by  R^°^  =  (p^^°\p2^°^).    It  can  be  easily  shown  (Cox  and  Miller, 
1965)  that 


=  r("-1)  p  =  R(n-2)  p2     ...     ^i^)p\  (5.2) 


Thus,  given  the  initial   probabilities  and  the  transition 

probability  matrix  £,  the  probability  that  the  n'th  interval  will  be 
of  type  1  or  type  2  can  be  found.    The  matrix      is  called  the  n-step 
transition  probability  matrix,  and  the  probabilities  P^^^^  ^2^^^ 
called   interval-type   probabilities    (in   contrast   to   the  state 
occupation  probabilities  of  a  Markov  chain  process). 

After  a  sufficiently  long  period  of  time,  the  system  settles  down 
to  a  condition  of  statistical  equilibrium  in  which  the  interval-type 
probabilities  are  independent  of  the  initial  conditions.    Then,  there 
is  an  equilibrium  probability  distribution     e  =  (e^,e2),  which, 
letting  n  ^  »  in  (5.2),  satisfies 

e=eP.  (5.3) 
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The  solution  of  (5.3)  with  respect  to  the  row  vector^,  subject  to  the 
constraint  e^+e2  =  1,  e^  ^2  -  ^ '  the  equilibrium  interval -type 

probabilities  associated  with  the  transition  probability  matrix  £  of 
(5.1)  as 


1-a, 


1-a, 


^1  = 


2-a^-a2 


^2  '- 


2-^1-^2 


(5.4) 


(e.g..  Cox  and  Miller,  1965).    For  instance,  the  probability  e^  is  the 
unconditional  probability  that  an  arbitrary  interval  will  be  of  type 
1.    Note  that 


e^  +  e^  =  1. 


(5.5) 


From  the  theory  of  Markov  chains  we  know  that 


,(n) 


^1  ^2 
^1  ^2 


+  (a^+a2-l) 


^2  -^2 
-^1  ^1. 


®1  ^2 
^1  ^2 


and  therefore  from  (5.2) 

R(n)  _p(0) 

so  that  the  system  tends  to  a  statistical  equilibrium  with  a  rate 
depending  on  the  value  of  (a,  +  a^  -  1)"    which  tends  to  zero  as  n 


^1  ^2 
^1  ^2 


=  (e^,e2)  =  e. 
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increases  (Cox  and  Miller,  1965).    The  value  of  (a^  +32-1)  is  less 
than  unity  in  modulus,  except  in  the  trivial  cases  (i)       +       =  0, 
i.e.,  a^  =  0,  32  =  0  in  which  the  system  alternates  deterministical ly 
between  the  two  states  and  (if  the  initial  state  is  given  the  behavior 
of  the  system  is  non-random),  and  (ii)  a^  +  a2  =  2,  i.e.,  a^  =  1,  82  = 
1  in  which  the  system  remains  forever  in  its  initial  state.    For  a^  + 
a2  =  1  the  process  is  a  renewal   process,  and  the  transition 
probabilities  of  the  Markov  chain  are  equal   to  the  equilibrium 
probabilities,  i.e.,  a^  =  e^  and  a2  =  eg. 
5.1.1    Interval  Properties 

The  pdf  of  the  intervals  of  the  process  is  given  as 

f(x)  =  e^f^(x)  +  e2f2(x),  (5.6) 

where  e^  and  e2  are  the  equilibrium  probabilities  given  in  (5.4).  It 
is  easy  to  show  that  the  mean,  variance,  and  survivor  function  of  the 
interarrival  times  x  are  given  as 

E(X)  =  e^y^  +  e2M2.  (5.7a) 
var(X)  =  e^o^^  +  e202^  +  e^e^{u-^-U2)^  ^  (5.7b) 
R(x)     =  ejR^(x)  +  e2R2(x) ,  (5.7c) 

where  u^,       ,  i  =  1,2,  are  the  means  and  variances,  respectively,  and 
the   subscripts    indicate   the   types    of   the   intervals.  The 
autocovariance  function  of  a  two-state  semi-Markov  process  can  be 
shown  to  be 
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Cov(X..  X.^^)  =  (^^-^2)^6^626'^  .  k  =  1,2   (5.8) 

where 


3  =      +      -  1  (5.8a) 


and  therefore  the  autocorrelation  function  can  be  written  as 


in  which 


 2  2  2^ 

^  ^2^2    ^  e^e2(y^-y2) 

=  c6^ 


2 

(y-^-M2)  6^62 


2  2  2' 

^I'^l       ^2^2  6i®2^^r^2^ 


(5.9a) 


Consequently,  the  spectral  density  function  of  the  intervals,  given  in 
terms  of  is 


1 

f^(oj)  =          {1  +  2  z  p.  cos(kco) }  ,  0  <^  CO  <  TT,  (5.10) 

It  k=l 


which  takes  the  form 

1  gcosw  -  e 

f+(co)  =         {1  +  2c  ^   >»  (5.11) 

IT  1  +  3    -  2SCOSC0 
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where  c  is  defined  in  (5.9a)  and  e  in  (5.8a). 

Notice  that  the  autocovariance  function  of  the  intervals  of  a 
two-state  semi -Markov  model  and  therefore  the  power  spectrum  depend  on 
the  pdf's  fj^(x)  and  f2(x)  only  through  their  means  and  variances. 
This  can  provide  a  helpful  first  check  for  the  appropriateness  of  a 
semi-Markov  structure  for  a  series  of  events,  since  no  assumption 
about  the  pdf's  of  the  intervals  is  required. 
5.1.2    Counts  Properties 

Cox  (1963)  first  showed  that  the  Laplace  transform  of  the 
conditional  intensity  function  of  a  two-state  semi-Markov  model  is 
given  as 

*  e^f^*(s)  +  e2f2*(s)  +  (1  -  a^  -  a^)  f^*(s)f2*(s) 

^  (s)  *  *  *  *  ,  (5.12) 

1  -  a^fj  (s)  -  a2f2  (s)  -  (1  -  a^  -  d^)       {s)f^  (s) 

*  * 

where  f^^  (s)  and  f2  (s)  are  the  Laplace  transforms  of  the  pdf's  f^(x) 
and  f2(x),  respectively.    Explicit  formulae  for  h(t)  exist  whenever 
the  inversion  of  (5.12)  is  possible.     Given  h(t),  all  the  other 
properties  of  counts  can  be  obtained  from  the  following  general 
relationships: 

Y^.(t)  =  m[h(T)  -  m],  (5.13) 
.t 

t)  dx,  (5.14) 


H(t)  =  /h( 

0 


*  m         2h*(s)m  2m^ 

V  (s)  =  — p-  +   p  (5.15) 

s^  s"^  s-^ 
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*  ★ 

g+(w)  =          [1  +  h  (i^)  +  h  (-iui)],  (5.16) 

ir 

(see  Cox  and  Lewis,  1978,  for  proofs).    In  the  above  equations,  the 
rate  of  occurrence,  m,    is  given   in  terms  of  the  transition 
probabilities  as 

1                       2  -  a,  -  a^ 
m  =    =  ^  .  (5.17) 

^1^1     ^2^2  "  ^2^'^1         "  ''l)*^2 

5.2    A  Discrete  Semi-Markov  Model  for  the  Daily  Rainfall  Occurrences 
Daily  rainfall  occurrences  are  the  result  of  the  interaction  of 
several  rainfall  generating  mechanisms.    For  example,  the  first  rainy 
day  in  a  wet  period  may  be  the  result  of  a  frontal  storm  passing  over 
a  region,  whereas  subsequent  rainy  days  in  the  same  wet  period  may  be 
just  aftereffects  (secondary  events).    In  that  sense,  times  between 
events  may  come  from  different   probability  distributions,  for 
instance,  one  with  a  smaller  coefficient  of  variation  for  the 
secondary  events,  and  one  with  a  large  coefficient  of  variation  for 
the  primary  events.     The  sequence  of  event  types  is  governed  by 
transition  probabilities,   with   higher   probabilities   of  having 
secondary  events  after  a  primary  event  or  after  a  small  number  of 
secondary  events. 

In  view  of  this,  a  two-state  semi-Markov  model  is  proposed  for 
the  daily  rainfall  occurrences,  in  which  the  times  between  events  are 
sampled  from  two  different  geometric  distributions  with  parameters  p^ 
and  p^,  according  to  a  Markov  chain  with  the  transition  probability 
matrix  P_  of  (5.1).     The  notation  SMGG  will  be  used  to  denote  a 


67 


two-State  semi -Markov  model    (SM)  with  both  type  1  and  type  2 
interarrival   times   having  geometric  distributions   (GG).  The 
statistical  properties  of  intervals  and  counts  for  a  SMGG  process  are 
derived  below. 

Let  f, (x)  and  f9(x),  defined  as 


be  the  continuous  representations  of  the  geometric  probability  mass 
functions  (pmf)  of  the  interarrival  times.    Notice  that  f^(x)  and 
f2(x)  are  probability  density  functions  (pdf).     The  statistical 
properties  of  intervals  and  counts  of  a  SMGG  process  are  given  in  the 
following  propositions. 

PROPOSITION  1:     The  moment  generating  function  of  the  interarrival 
times  of  a  SMGG  process  is  given  as 


00 


k-1 


f.(x)  =  I  p.(l  -  p. 
^         k=l  ^  ^ 


6(x  -  k)  ,  for  i  =  1,2 


(5.18) 


1 


ijj(s)  = 


•[(1-32) 


3  Ml-a^) 


7].  (5.19) 


(2-a^-a2) 


1-(1-Pl)e 


1-(1-P2)e 


Moments  of  the  interarrival  times  are  obtained  from 


E(X'')  =  (-1) 


s=o 


(5.20) 
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Proof:    The  proof  follows  immediately  from  (5.6)  by  noting  that  the 
moment  generating  function  of  a  geometric  distribution  with  parameter 
p,  is  given  as 


l-(l-p)e' 


Corollaries:    The  mean  interarrival  time  of  a  SMGG  process  is 

1         1-a,  1-a, 

E(X)  =    [  ^  +   (5.21) 

(Z-a^-a^)     p^  P2 

The  variance  of  the  interarrival  times  of  a  SMGG  process  is 

1  1-p,  1-p.  (l-a,)(l-aj    1        1  „ 

var(X)=  [(1-aJ  ^(1-a,)  |h  ^-  ^(  )2]. 

(2-a^-a2)  p^^  p^^    (2-a^-a2)  p^ 

(5.22) 

The  survivor  function  of  the  interarrival  times  of  a  SMGG  process  is 


1 

R(x)  =   [(l-aj(l-p  )^  +  (l-a,)(l-p.)^],  x=l,2.... 

(2-a,-aJ         ^        ^  12 

^    ^  (5.23) 


Equation  (5.23)  follows  immediately  from  (5.7c)  by  noting  that  the 
survivor  function  of  a  geometric  distribution  is  given  as  (1-p)^. 
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PROPOSITION  2:    The  autocorrelation  function,  P|^,  of  the  interarrival 
times  of  a  SM6G  process  is  given  as 

=  c(a^+a2-l)^  (5.24) 

where 

{l-a^){\-a^)  (Pl-P2)^ 

2-a^-a2        {l-a2) (l-p^) P2^+{l-a^ ) (l-p^) p^^+(l-a^ ) (l-a2) (p^-P2)^ ' 

(5.25) 

Proof:    Eq.  (5.25)  follows  from  (5.9)  after  substituting  the  means  and 
variances  of  the  two  geometric  distributions  as  functions  of  the 
parameters  p^  and  P2. 

Note  on  terminology:    For  a  discrete  point  process,  we  introduce  the 
term  conditional  probabilities  of  occurrence  for  the  discrete  sequence 
of  conditional  probabilities  ^hj^} ,  k  =  1,2,...,  analogously  to  the 
conditional  intensity  function  h(t)  of  a  continuous  point  process. 
The  relationship  between  h(t)  and  hj^  is  simply 

00 

h(t)  =    E  h,   5(t-k),  (5.25) 
k=l  ^ 

where  &{-)  is  the  Dirac  delta  function.     The  interpretation  of  {h^} 
remains  the  same  as  in  the  continuous  case;  values  of  hj^  greater  than 
the  constant  (unconditional)   probability  of  occurrence  m  imply  a 
greater  likelihood  of  having  an  event  at  time  (t+k)  due  to  an  event  at 
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time  t.  The  conditional  probability  of  occurrence  sequence  {h^},  from 
which  all  the  other  statistical  properties  of  the  counting  process  can 
be  derived,  is  given  in  the  following  proposition. 


PROPOSITION  3:    The  conditional  probability  of  occurrence  sequence  ih^} 


of  a  SMGG  process  is  given  as 


h,^  =  m  +  AW*^        ,  k  =  1,2,... ,  (5.26) 


where 


^iPl     ^2^2  "  ^^'^^^ 


and 


W  =  l-p^(l-a^)-p2(l-a2).  (5.28) 

The  equilibrium  probabilities  appearing  in  (5.27)  are  given  in  (5.4) 
and  the  mean  intensity  of  the  process,  m,  (i.e.,  the  unconditional 
probability  of  occurrence  of  an  event),  can  be  given  in  terms  of  the 
transition   probabilities   and  the  parameters  of  the  geometric 
distributions  as 


p, pp(2-a,-ap) 

^  ^       ^    ^  (5.29) 


p^(l-a^)+P2(l-a2) 
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* 

Proof:    The  Laplace  transform,    h{s),  of  the  intensity  function,  h(t), 

* 

is  obtained  by  substituting  in  (5.12)  the  expressions  for  f.(s), 

i  =  1,2  ,  which  are  the  Laplace  transforms  of  f^(x),  i  =  1,2,  defined 

in  (5.18) ,  i.e., 

fAs)  =  r.  (5.30) 

'  1-(1-Pi)e-' 

* 

After  lengthly  algebraic  manipulations,    h(s)  is  inverted  to  give  h(t) 
whose  discrete  analogue  is  hj^  of  (5.26).     More  details  on  this 
derivation  are  given  in  Appendix  B. 

REMARK  1:    The  conditional  intensity  function  h(t)  of  a  SMGG  process 
tends  monotonical ly  to  the  mean  rate  of  occurrence  m,  as  t  becomes 
large.    Specifically,  h(t)  decreases  geometrically  to  the  constant 
intensity  m,  since  A  can  be  shown  to  be  positive  and  0  <  W  <  1.  This 
implies  that  the  semi-Markov  process  exhibits  clustering.  Although 
the  shape  of  the  conditional  intensity  function  is  only  indicative  of 
the  presence,  but  not  the  type,  of  clustering,  the  fact  that  the 
coefficient  of  variation  of  the  interarrival  times  can  take  values 
greater  or  less  than  one  (see  equations  5.21  and  5.22)  suggests  that 
the  semi-Markov  process  can  accommodate  rainfall  occurrence  structures 
which  the  Neyman-Scott  process  and  doubly  stochastic  Poisson  processes 
cannot. 

COROLLARY  1:    The  expected  number  of  events  in  an  interval  of  length 
t,  given  that  the  interval  starts  with  an  event,  is  given  as 
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H(t)  =  mt  +  A    Z  W 
k=l 


=  mt  +  AW   

W-1 


,  t  =  1,2, 


(5.32) 


where  the  parameters  m,  A  and  W  have  been  defined  in  (5.29),  (5.27), 
and  (5.28),  respectively,  and  t  is  referring  to  discrete  time  units. 

Proof  of  Corollary  1 :    The  mean  function  H(t)  of  a  continuous  point 
process  is  defined  as  the  integral  of  the  conditional  intensity 
function  h(t)  in  (5.14).    For  discrete  point  process  we  can  write  by 
analogy, 


from  which  (5.32)  follows  imnediately. 

COROLLARY  2:  The  variance  of  the  number  of  events  in  an  interval  of 
length  t,  V(t),  where  the  interval  starts  with  an  event,  is  given  as 


t 

H(t)  =    Z  h 
k=l 


Z  (m  +  Aw"^) , 
k=l 


(5.33) 


V(t)  =  mt  -  m'^t'^  +  2m  z  (t-k)h 

k=l 


(5.34) 


where  h.   is  given  by  (5.26). 
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* 

Proof  of  Corol lary  2:    The  Laplace  transform  V  (s)  of  the  variance 
time  curve  is  given  in  terms  of  the  Laplace  transform  of  the 
conditional  intensity  function  by  (5.15).    By  taking  inverse  Laplace 
transforms: 

^* 

V(t)  =  mZ~^(^)  -  2m^-^"^(^)  +  Z^t'h-^l, 
s^  s-^  s*^ 


and  therefore 


t  o 

V(t)  =  mt  -  m^t^  +  2m  jj  h(T)dTda  , 

0  0 


which  in  the  discrete  domain  can  be  written  as 


V(t)  =  mt  -  m^t    +  2m    E  Eh. 

i=l  k=l  ^ 


2  2 

=  mt  -  m^t    +  2m    E  (t-k)h.  . 

k=l  ^ 


This  completes  the  proof  of  Corollary  2. 


PROPOSITION  4:  The  spectrum  of  counts,  g^(ca),  of  a  SMGG  process  is 
given  as 
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m  W  -  cosco 

9M  =          [1  -  m  -  2A  (5.35) 

IT  1  -  2WCOSC0  + 

where  m.  A,  and  W  have  been  defined  previously. 

Proof:     Eq.   (5.35)   is  derived  by  substituting  into  (5.16)  the 
* 

expression  of  h  (s)   from  (5.12)   and  performing  lengthy  algebraic 

*  * 

manipulations.  Expression  of  f^  (s)  and  f^  (s)  needed  in  (5.12)  are 
obtained  from  (5.30). 

5.3  Discussion 

In  this  chapter  a  discrete  semi-Markov  model  was  introduced  and 
its  statistical  properties  derived.    It  was  seen  that  the  model  has 
considerable  flexibility  (see  Remark  1)  in  the  sense  that  it  can  model 
structures  with  different  types  of  clustering.    It  remains  to  explore 
parameter  estimation  methods,  and  to  apply  the  model  to  observed  daily 
precipitation  series.    In  the  next  chapter,  methods  for  fitting  the 
model  are  studied. 


CHAPTER  6 

FITTING  THE  DISCRETE  SEMI-MARKOV  MODEL 


The  Fifth  who  chanced  to  touch  the  ear. 
Said:    "E'en  the  blindest  man 

Can  tell  what  this  resembles  most; 
Deny  the  fact  who  can, 

This  marvel  of  an  Elephant 
Is  very  1  ike  a  fan  !" 

The  discrete  semi-Markov  model  developed  in  Chapter  5  has  four 
parameters:     a^         p-^ ,  and  p^.     These  parameters  are:  a^,  the 
transition  probability  from  type  1  to  type  1   interval;  a^* 
transition  probability  from  type  2  to  type  2  interval;  p^ ,  the 
parameter  of  the  geometric  distribution  of  the  type  1  intervals;  and 
P2  the  parameter  of  the  geometric  distribution  of  the  type  2 
intervals.    Note  that  the  type  1  and  type  2  intervals  are  in  general 
indistinguishable  from  each  other  by  direct  observation  of  the  series 
of  daily  rainfall  events.    Thus,  the  transition  probabilities  a^  and 
a2  cannot  be  estimated  directly  from  the  data,  but  instead  have  to  be 
estimated  together  with  the  parameters  of  the  two  geometric 
distributions,  p^  and  p^.     The  estimation  methods  studied  are  the 
method  of  moments  (MOM)  and  two  approximate  maximum  likelihood  (ML) 
estimation  methods. 

6.1    Method  of  Moments 
The  first  three  moments  and  the  lag-one  covariance  of  the 
interarrival  times  of  the  semi-Markov  model  SMGG  are  given  as 
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functions  of  the  parameters  a^ ,  a^,  p^  and  p^  as  follows: 


1        l-a«  1-a, 

E(X)  =   [  ^  +   ^-1,  (6.1) 

2-a^-a2     p^  P2 


.          1         (l-aJCZ-pJ  (l-a,)(2-P2) 
E(x2)  =    [  ^  +   L_  (6.2) 

2-a^-a2  p-^  Pg 


3          1        (l-a2)(p,2-  p,+6)     (l-a,)(p2^-  p.+6) 
E(x^)  =    [  2  1_  1  ^  1  2  2  (g^3) 

2-aj-a2  p^  P2 


1  1        1  2 

=  2    (l-ai)(l-a2)(  )  (a^+a2-l).  (6.4) 

(2-a^-a2)  p^  Pg 


The  above  four  equations  can  be  numerically  solved  for  a^  a2,  p^  and 
P2  using,  for  instance,  the  Newton-Raphson  method.    Since  all  four 
parameters  are  probabilities,  they  must  lie  inside  the  interval  [0,1]. 
Therefore,  a  transformation  was  applied  to  these  parameters  to 
unconstrain  them,  and  the  search  carried  out  in  the  unconstrained 
space.    Denoting  by  y  the  real  parameter,  y£(£,u),  and  by  y'  the 
unconstrained  parameter,  y' £(-»,+»),  the  following  transformation  was 
used: 

y  =  £  +  (u-£)  sin2(y')  (6.5) 


where  i  and  u  are  the  lower  and  upper  bounds  on  the  parameters.  The 
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values  of  z  =  0.01  and  u  =  0.99  were  used  to  avoid  numerical  problems 
at  the  bounds. 

Due  to  the  long  tail  of  the  probability  distribution  of  the 
interarrival  times,  the  use  of  the  third  moment  in  the  estimation  is 
not  desirable.    A  modified  method  of  moments  estimation  which  involves 
the  median  instead  of  the  third  moment,  was  tested.    The  median  of 
the  pdf  of  the  interarrival  times  of  the  semi-Markov  model  SMGG,  is 
given  by  the  following  equation: 


  [(l-aj(l-pj     +  (l-aJd-p.)  '"]  =  0.5.  (6.6) 

(2-aj-a2)         ^        ^  ^  ^ 

Modified  method  of  moments  estimates  were  then  obtained  by  simply 
substituting  (6.6)  for  (6.3). 

6.2    Approximate  Maximum  Likelihood  Estimates  (MLE) 
Let  I.,  i  =  1,2,...  n,  denote  the  type  of  the  ith  interval  in  one 
realization  of  length  n  of  the  point  process.    Then,  I.e{I,II},  where 
I  stands  for  type  1  interval  and  II  for  type  2  interval.    Let  ]_  also 

denote  the  vector  (1^  I2  1^)^,  that  is,  the  vector  of  the  types 

of  intervals  of  all  n  interarrival  times  of  the  given  realization. 
The  general  form  of  the  likelihood  function  of  a  two-state  semi-Markov 
model  can  be  expressed  as: 

L(x)  =    E    p(x^,  x^|I^,  ...  I^)  p(I^,  ....  I^) 

"  (6.7) 

n  n-1 

=  z   n  p(x.|i.)   n  p(i.,,  |IJ  p(ij 

I  i=l       ^    \  i=l  ^  ^ 

~    ^(A)^         ^  (B) 
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where  the  summation  is  over  all  possible  vectors  1,  i.e.,  over  all 
possible  vectors  of  length  n  formed  by  the  two  elements  I  and  II. 
Observe  in  the  above  expression  that  the  term  (A)  depends  only  on  the 
vector  of  interarrival  times  x  =  (x^,  x^,         x^)^  and  the  parameters 
of  the  two  geometric  distributions  p^  and  p^,  while  the  second  term 
(B)  depends  only  on  the  transition  probabilities,  a^ ,  and 
Markov  chain  of  intervals,  i.e., 

L(x)  =    E    f(x.Pi,P2)  f(ai.a2)  (6.8) 

where  f(-)  denotes  function  of  (•)•    It  becomes  apparent  from  (6.7) 
that  the  likelihood  function  of  the  semi-Markov  model  cannot  be 
expressed  in  a  tractable  closed  form  as  function  of  the  parameters  a^ 
a^,  Pp  p^  and  the  vector  of  observations  x-     Although  numerical 
evaluation  of  the  likelihood  function  is  possible,  it  is  infeasible 
for  typical  sample  sizes  of  several  hundred  values,  since  it  requires 
double  summations  over  all  possible  vectors  U 

In  view  of  the  above,   an  approximate  maximum  likelihood 
estimation  procedure  has  been  developed.    This  procedure  consists  of 
two  steps.    The  first  step  involves  the  maximum  likelihood  estimation 
of  e^,  p^,  and  p^,  where  e^  is  the  equilibrium  probability  of  the 
Markov  chain  of  intervals.    Given  the  equilibrium  probability  e^ ,  the 
transition  probabilities,  a^  and  a^,  of  the  Markov  chain  of  intervals 
are  subsequently  obtained.     It  is  understood  that  although  the 
parameters  p^  and  exact  maximum  likelihood  estimates,  the 

parameters  a^  and  ^2  ^""^  and  thus  the  method  is  termed  approximate 
maximum  likelihood.    Details  of  this  method  follow. 
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The  probability  mass  function  (pmf)  of  the  interarrival  times  of 
the  semi-Markov  model  is  given  as: 

p{x)  =  e^p^(l-p^)''-^  +  (l-e^)p2(l-P2)''"^  (6.9) 

where  e^   is  the  equilibrium  probability  of  the  Markov  chain  of 
intervals,  i.e.,  the  unconditional  probability  of  any  interval  being 
of  type  1.    The  log-1 ikel ihood  function  L'(x)  is 

n  n 
L'(x)  =  ln[L(x)]  =  ln[  n  (p{x.))]  =  E  ln[p(x.)] 

i=l       ^  i=l 

(6.10) 

n  x.-l  x.-l 

=        ln[e^p^(l-p^)        +  (l-e^)p2(l-P2)  ]• 

Parameter  estimates  for  e^ ,  p^  and  P2  can  be  obtained  by  maximizing 
L'(x),  for  instance,  using  the  simplex  method  of  Nelder  and  Mead 
(1965).    The  optimization  is  carried  out  in  the  unconstrained  space 
using  again  the  transformation  (6.5).    Estimates  of  the  parameters  a^ 
and  a2  can  subsequently  be  obtained  by  one  of  the  two  methods 
described  below. 

6.2.1    Estimation  of  the  Transition  Probabilities  Using  the  First 
Autocorrelation  Coefficient 

The  first  autocorrelation  coefficient  of  the  semi-Markov  model  is 

given  as 

r^  =  c(a^  +  a2  -  1),  (6.11) 
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where 


e^(l-ej)(l/Pj  -  1/P2)^ 

c  =   J,  (6.12) 

1-p  1-p.  1  1 

e^-jMl-ei)— f  ^  ei(l-ei)(— -— ) 

Pi  h  h  P2 


and  e^  is  the  equilibrium  probability  of  the  Markov  chain,  given  in 
terms  of  the  transition  probabilities  as 

e^^  =  (l-a2)/(2-a^-a2).  (6.13) 

Equations  (6.11)  and  (6.13)  can  be  solved  for  a^^  and  a^,  giving 

^1  "  (l-ei)(ri/c  +  1)  +  2e^  -  1 

and  (6.14) 

^2  ^  ^i^^i^^  +  1)  -  2ej  +  1. 

From  the  above  two  equations  it  can  be  shown  that  for  acceptable 
parameter  estimates,  that  is,  0  <  a^  a2  <  1,  the  following  inequality 
must  hold: 


e        1  ~  e  r 

-min(— ,   ^)  <        <  1.  (6.15) 

l-e^       e^  c 
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Note  that  the  value  min(e^/(l-e^)  ,  (l-e^)/e^)  =  min  (e^/e2,  62/6^) 
corresponds  to  the  ratio  of  the  smallest  to  the  largest  equilibrium 
probability,  a  value  always  less  than  1.     The  inequality  (6.15), 
therefore,  is  consistent  with  the  requirement  that  the  autocorrelation 
function  of  the  process,  given  as 

r^  =  c(a^  +  a^  -  l)""  (6.16) 

is  less  than  one  in  absolute  value.    Note  also  that  the  equal  signs  in 

(6.15)  are  not  permitted  since  from  (6.11)  they  can  be  shown  to 

correspond  to  the  following  trivial  cases.    The  right  hand  side  equal 

sign  implies  a^  +  a2  =  2,  and  therefore      =  1  and  a2  =  1,  in  which 

case  the  system  remains  forever  in  the  initial  state.    The  left  hand 

side  equal  sign  implies  a^  =  0  and  a2  =  0,  in  which  case  the  system 

alternates  deterministical ly  between  the  two  states  and  given  the 

initial  state,  the  behavior  of  the  system  is  non-random. 

Therefore,  estimates  for  a^  and  a2  cannot  be  obtained  by  the 

above  method,  unless  the  ratio  of  the  estimated  first  autocorrelation 

coefficient  r^  to  the  value  c,  satisfies  (6.15).    The  value  of  c  is 

obtained  from  (6.12)  using  the  values  of  e^ ,  p^  P2  estimated  from  the 

approximate  maximum  likelihood  function.    It  was  found  that  (6.15)  was 

not  satisfied  in  general,  and  therefore  the  transition  probabilities 

cannot  always  be  estimated  using  this  method. 

6.2.2.    Bayesian  Approach  to  Estimation  of  the  Transition 
Probabil  ities 

Let  5,  .  ,  Co  ^  denote  the  conditional  probabilities  of  an 
interval  having  length  x.  given  that  it  is  of  type  1  (I)  or  type  2 
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(II),  respectively.  In  view  of  the  geometric  distributions  for  the 
two  types  of  intervals,  these  probabilities  can  be  written  as 


x.-l 

^l,i  "  P(^=^il^)  =  Pi(l-Pi) 
and  (6.17) 

x.-l 

The  conditional  probabilities  of  an  interval  being  of  type  1  or  type  2 
given  that  it  has  length  x=x.  ,  i.e.,  p(l|x=x^.)  ,  p(II  |x=x^.)  can  now 
be  determined.    Using  Bayes  theorem 


p(I,  x=x.)       p(I)  p(x=x.  ID  6,^1 

p(l!x=x.)  =   =  =    ^  ^'^  (6.18) 

p(x=x^.)  p(x=x.)  p(x=x.) 


and  analogously. 


ep^P        (1-e,  )C2  i 

P(lllx=x.)  =  =   

p(x=x.)  p(x=x.) 


(6.19) 


where  ^,  •  and  ^  .  are  given  in  (6.17)  as  functions  of  the  parameters 
p^  and  p^,  and  p(x=x^)  is  given  from  (6.9)  for  x=x^. 

The  transition  probabilities  a^^  and  ag  can  be  estimated  as 


n-1  n 

a,  =  p(l|l)  =    E    p(I|x.  ^  I|x.  ,)  /    E  p(l|x.) 
i  i=l         ^  i=l  ^ 
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n-1  n 
I    Pdix^)  Pdlx^^.^)  /  E    p(l|x.)  (6.20) 


i=l 


i=l 


where  p(l|x.),  p(l|x._^^)  are  given  in  (6.18),  and  p(x=xp,  p(x=x.^^) 
in  (6.9). 

6.3    Monte  Carlo  Tests  of  Estimators 
The  fitting  methods  considered  are  basically  variations  of  method 
of  moments  (MOM)  and  Maximum  Likelihood  (ML)  estimation  methods.  In 
particular,  the  following  five  methods  were  tested  for  efficiency  and 
consistency  in  estimation: 

Ml:    MOM  on  E(X),  E(X^),  E(X^)  and  Cov(l)  ^  a^  i^,  p^ , 

M2:    MOM  on  E(X),  E(X2),  median  and  Cov(l)  ^  a^  a^'  P2 

MS:    MOM  on  E(X),  E(X2),  E(X^)  ^  e^ ,  p^ ,  P2;  coupled  with  r^  ^  a^  a2 

M4:    ML     e^ ,  p^ ,  P2;  coupled  with  r^     a^  ^2 

M5:    ML     e^ ,  p^ ,  P2 ;  Bayesian  approach  ->  a^ ,  ^2  • 

Recall  from  Chapter  5  that  depending  on  whether  a^  +  a2  >  1  (or 
<  1),  the  first  and  all  the  odd-lagged  autocorrelation  coefficients  of 
the  interarrival  times  become  positive  (or  negative).    Most  of  the 
daily    rainfall    structures    analyzed    exhibited    a  positive 
autocorrelation  structure  of  intervals,  although  few  significant 
lag-one  autocorrelation  coefficients  were  present  as,  for  example,  for 
the  station  of  Denver.    In  view  of  the  above,  Monte  Carlo  tests  of 
estimators  were  performed  for  sets  of  parameter  values  for  a,  and  a^ 
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such  as  both  of  the  above  model  structures  are  covered. 

The  first  set  of  parameters  tested  was  {  a^  =  0.4,  a^  =  0.3, 
=  0.8,  P2  =  0.2  }  .     These  parameter  values  correspond  to  an 
occurrence  process  with  a  mean  interarrival  time  of  2.98  days,  a 
standard  deviation  of  3.59  days  (coefficient  of  variation  c^  =  1.2),  a 
skewness  coefficient  equal  to  3.01,  and  a  first  autocorrelation 
coefficient  r^  =  -0.08.    The  conditional  intensity  function  for  these 
parameters  takes  the  form  hj^  =  0.335  +  0.186(0.38)  ,  which  indicates  a 
clustering  of  counts.    Such  statistics  are  representative  of  daily 
rainfall  occurrence  processes,  as  can  be  seen  from  Table  A. 2  of 
Appendix  A.    Five  hundred  synthetic  sequences  of  50,  100,  200,  500  and 
800  events  were  generated  from  a  semi-Markov  model  with  the  above 
parameters.    The  implied  rate  of  occurrence  of  the  process  is 
m  =  1/2.98  =  0.34  days"''',  and  therefore  these  sequences  correspond  to 
approximately  150,  300,  600,  1500,  and  2400  days  of  observation, 
respectively. 

The  five  methods  (Ml,  M2,  M3,  M4,  and  M5)  discussed  previously 

were  fitted  to  all  synthetic  sequences.     The  bias  and  standard 

deviation  of  the  estimated  parameters  are  given  in  Table  6.1.    As  was 

expected,  the  consistency  (bias)  and  efficiency  (variability)  of  the 

estimators  improve  with  the  number  of  events  available  for  the 

estimation.    The  best  estimators  in  terms  of  root  mean  square  error 
?  1/2 

(RMSE  =  ((bias)    +  variance)  ')  were  methods  M4  and  M5  which  are  the 
two  approximate  maximum  likelihood  estimation  methods  using  the  first 
autocorrelation  coefficient  and  a  Bayesian  approach,  respectively. 
Method  M4  has  a  low  bias  but  a  large  variance  as  compared  to  method  M5 
which  has  a  larger  bias  but  a  much  smaller  variance  for  the  parameters 
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and  a2.    It  is  also  observed  from  Table  6.1  that  in  terms  of  RMSE 

method  M4  is  the  best  for  large  sample  sizes  (larger  than  500  events) 

whereas  method  M5  is  the  best  for  small  sample  sizes.     This  was 

expected  given  that  method  M4  involves  an  estimate  of  the  first 

autocorrelation  coefficient  of  the  intervals.    In  addition,  method  M5 

always  gives  a  fit,  whereas  method  M4  failed  in  a  number  of  cases. 

The  second  set  of  parameters  tested  was  {a^  =  0.9,      =  0.6, 

p^  =  0.8,  pg  =0.4}  .    These  parameters  correspond  to  an  occurrence 

process  with  mean  interarrival  time  1.5  days  (mean  rate  of  occurrence, 

m  =  0.667  days'^),  a  standard  deviation  of  1.11  days  (coefficient  of 

variation  c^  =  0.74),  and  skewness  coefficient  c^  =  4.02.  The 

k-1 

autocorrelation  function  of  the  process  is       =  r^(a^  +  a2  -  1)  , 
where  r^  =  0.1,  and  (a^  +  32  -  1)  =  0.5,  and  the  conditional  intensity 
function  is  h|^  =  0.667  +  0.05(0.76)"^.    These  functions  indicate  a 
strong  dependence  stucture  in  the  intervals  but  a  relatively  small 
clustering  in  the  counts.    The  results  of  the  estimation  methods  are 
shown  in  Table  6.2.    Method  M5  performed  poorly,  whereas  method  M4 
gave  satisfactory  parameter  estimates.    These  results  suggest  that 
method  M4  may  perform  better  when  a  strong  autocorrelation  in  the 
interarrival  times  is  present,  and  method  M5  when  the  clustering  of 
counts  is  the  more  significant  element  of  dependence. 

The  effect  of  the  first  autocorrelation  coefficient  of  the 
process  on  the  consistency  and  efficiency  of  the  estimators  a^ ,  a^,  p^ 
and  P2  was  also  tested.     For  the  discussion  that  follows,  the 
convention  is  made  that  e,  corresponds  to  the  geometric  distribution 


s~ 

0) 
■M 
(U 
E 

i- 

10 

a. 


-o 

o 
s: 

> 
o 

lO 


0)  CM 
to  • 
O 

II 

s_ 

O  CM 
M-  Q. 

LO  "O 

s-  c 

O  fD 


_E  00 

+J  o 

(/> 

UJ  II 

C  >— I 

O  Ql 

(/I  «- 
-i-j  ro 

3  o 

I/) 

dJ  II 

o  re 

o  • 

o 


c 
o 


J3 
(O 


II 


0. 

c 

■c 
I. 


o  ID  cc 
ur.  cc  1^  cc 

^  ^  \C  LO  IT) 

o  o  c;  o  o 

c  o  o  o  c 


O  CVJ  O 
o  i£)  ro  ro 
o  o  c  c\j  m 

o  o  o  o  o 


i£  I —  ro 

c\j  00  ro  ro  ^ 

o>  CNj  ^  CM  in 

^  Csj  CM  ^  — ^ 

o  o  o  o  c 


od  CM  a^  ^ 

VD  CTi  CM  ^  ^ 

t— I  *r  f*^  CO  ir> 

CM  CM  CM  CM  ^ 

o  o  o  o  c 


CC  Id  — •  o 

^  U2  ^  cc 
o  o  — <  o  o 
o  o  o  c  o 

c  c  o  c  c 


CM  ^  «J-  r--  CM 
"C-  o  CM 

.-H       O  CM  .-H 

c  o  c  o  o 
o  o  o  o  o 


CM  ^  lo  in  00 
o      in  in 

m  CM     >  C\J  CM 

O  O  O  O  r- 

o  c  o  o  c 


o 

CXi 

ic 

CVJ 

IC 

CVJ 

o 

c 

O 

o 

c  o  o  o  o 


oc  <yi  CO  o 
i£;  CM     ^  o 


O  CT^  lO  CM 

r-H  O  LO  ^ 

ro  m  LO  ro 

o  c  c  c  o 
c  c  c  c  o 


CO  CC  Oo  CVJ 
I —  r**  ro  — I 
— «  g  CC  O  O 

o  o  c  o  o 


m        .  LO  ^ 

*£)  ^  LTl 
CC  r-^  ro  r-^  ^ 
,— .  ^  CM  ■— '  ' 

C  O  O  C  C 


cyi  ^  IX)  CO  ^ 
O      O  CO  o 

O  O  LO  0>  CM 
.— t  CM  CM  ^  r-< 

o  o  c  o  o 


O  "-C  Lf) 

CM  I— I  ^  .-H  in 

o  o  o  o  o 

o  c  o  c  o 

o  o  o  o  o 


CM  m  CM  T3-  CO 

LO  r-  CO  CM  CO 

CM  1^  r-H  ^  ^ 

o  o  o  o  o 

O  O  C  O  CD 


CM  ^  LO  r-.  LO 

Cr       00  ^  — ' 

CM  CM  CO  o  m 

O  O  O  C  r-t 


>—>      CO  ^  ro 

CM  CO  O  CM 

o  o  o  c  ^ 
o  o  o  o  o 


i-«  CM  ro  ^  lO 


CM  lO  00  CM  O 
CM  t— I  CO  •— t  O 

'  CM  cv:  ^  lO 


CO  CM  LO 
LO         O  LO  "sO 
CM  CM  <cr  CM  CM 

o  c  o  o  c 
c  c  o  o  c 


oc  o  LO  o  rn 
,— I  o  ^  cy^ 

^  r-.  X  ^ 

o  ^  o  o 
c  o  c  o  o 


r--.  O  "X)  lO  ^ 
CM  .— t  r*-  O  CM 
CM  LO  r-^  ^  CC 

^  ^  ^  ^  o 

O  O  O  O  O 


^  ^  CM  ro  cc 

ro  OC  LO  CM 

>sC     •  LO  CO 

^  CM  r-^  O 

o  o  c  o  o 


^  a>  C  «T 
^  ID  CTi  O  1— " 

o  o  o  c  o 
o  o  o  o  o 


m  CO  •— 4  k£) 

^  a%  ^  rn 
^  CO  o  o  o 
c  o  o  o  o 

o  o  o  o  o 


CM        CO  CM  lO 

^  LO  LO  LO 

O  i£)  O  ^  CM 

o  o  o  o  ^ 

o  o  o  o  o 


X  CO  CJ^  CT^ 
CM  LO  T-H  CM  O 
^       CO  o  ' 

o  o  o  o 
o  o  o  o  o 


CO  LO  CM  O 
CO  I —  CO  O 
<-H        CM  ^  LO 


8 


CO  o  cji  in 
o  ^  O  00 
CM  c\j  ro  ^  ^ 
o  c  o  o  o 

o  c;  c  o  c 


CO  ^  ^  o  cc 
O  r~  "3-  — < 
r-.  in  in  in  in 
c     c  o 

o  o  o  o  o 


in  CM  ^  lO  o 

CM  m  ^  ic  vo 

^  ^  ro  a-,  in 

— •  «  — •  o  o 

o  o  o  c  c 


O  C  CO  CM  lO 
O  CM  ^  X  1^ 

CM  O  in 

— •  —  — «  ^  o 

o  o  o  o  o 


CM      r*.  ^ 

r-.  in  in  ^  CM 

o  o  o  o  o 

c  c  o  o  o 


m  l£)  CM  o  o^ 

^  lo  f-H  n 

CVJ  r-.  o  O  O 

o  o  o  o  o 
o  o  o  o  c 


o  CM  LO  in  ro 
«3-  o  CO  o  en 

O  U3  O  O  CM 

o  o  c  o 
o  o  o  o  o 


t— 1 1£)     m  CTi 

l£)  O  CT>  *3-  lO 

O  LO      o  o 

o  c  c  o  -< 

o  o  o  o  o 


cyi  CM  r-  o 
CM  «a-  CM  c:; 
CM  ro  ro  V 


lO  CM  ID  1^  CC 
CO  CO  «3-  »T 

O  O  O  O  C 


V  ^  O  CT.  ^ 

<cr  ^  c  CTi  o 
i— t  in  in  CO  ^ 
— <  o     o  c 

o  o  o  o  o 


o  m  LO  oo  CO 
^  CM  o  ro  ^ 
o  cn  ro  ^ 
o  — ■  c  o 

o  o  c  o  c 


ro  CO  CJ^  »r  « 
CM     o  in  in 
CM  O  LD  CO  •a- 
«  ^  o  o 

c  o  o  o  o 


^  Ln  lO  LD  r^ 

o  in  in  ^  ^ 
o  o  o  o  o 
o  o  c  o  o 

o  o  o  o  o 


ro 

CO 

cn 

LO 

CM 

o 

o 

o 

O 

o 

o 

o 

o 

O 

o 

o 

o 

o 

^  LO  f— t  lo 

CM  CM        O  CC 

o  ir>  o  o  CM 
o  o  o  o  ^ 


lO  ^  o  m  o 
o  o  cn  CO 
<— "  LO  o  o  o 
o  o  o  o  ^ 

o  o  o  o  o 


ro  ro  CM  O  O 

t£>  CO  ^  lO  LO 
<— <  CM  CM  CM  CM 


8 

CO 


■D 

01 

-a 

OJ 

OJ 

u 

o 

c 

3 

l/l 

3 

o- 

■o 

01 

o 

t/1 

01 

(J 

E 

OJ 

c 

Ift 

(/) 

<u 

01 

u 

u 

en 

c 

c 

OJ 

01 

c 

s 

3 

m 

o-  a- 

> 

OJ 

OJ 

OJ 

(/I 

l/t 

«*- 

l*- 

o 

o 

o 

^ 

OJ 

01 

^ 

XI 

E 

E 

E 

3 

3 

3 

C 

C= 

C 

II 

II 

II 

z 

E 

"s 

87 


CM 
CL 


o 


> 

Q 

-a 
i- 

03 
+-> 

I/O 


TO 


Q. 


to 


CO 


03 


o 


<Ni  CTl  00  LP 
O        Ol  CO  I — 


o  o  o 


IX) 
CM 

cc 
o 


CM 


CM 


00 

o 

CM 


CM 


O  «^ 

O  CO 
^  CTi 
^  O 


r-~. 
O 


O 


o 
I 


cc 


o 
I 


in 
o 


o  o  o 


ro  to  uo 
CM  ro  ro 
U2  r~~  ^a- 

CM  CM  CM 
•     •  • 

o  o  o 


rO  <Ti  CTi 
O  00 

C^l  CM 
CM  CM  Od 
•     •  • 

o  o  o 


CM  un  ro 
a.  cc  o 
CM  c  o 
I — I  O  « — ' 

•      •  • 

o  c  o 


00  UT) 

^  I —  m 
r-~  CM 
coo 

•       •  • 

coo 


P-.  CM  «^ 

r^>  .-t  cc 

cc  •53-  <a- 

.— 1  1 — ^  CM 


o  c 
I  I 


o 
I 


cn 

LD  CM  CTi 

■-H  OO 

t— I  r-H  CM 

•      •  • 

o  o  o 

I    I  I 


r-H  CM  CO  ^  LO 


.— I  O  O  1— I  c 

cc  CTi  O 

1— 1  r— I  LO 


o 
o 

ID 


o 


CO 

un 

CT. 

o 


o 


CO 

LO 
CM 


o 

CM 
00 


cr. 
oc 


00 

LO 

o 


o 
I 


oc 

CM 


o 
I 


CTi  C7i  <— ( 

LO  «^  O 

oj  Lf) 


O  O  o 


r— I   cc  ■— I 

CM  00 
CM  CTi  CTi 
.-I  O  O 
•     •  • 

o  o  o 


CM  CO  O 
00  I  o 
vc  CO  o 

CM  CM  CM 
•      •  • 

o  o  o 


CTl  CO  UO 
Lf)  O  •— I 

coo 

CM  CM  CM 
•     •  • 

o  o  o 


coo 

Lf)  CO  >— I 
1£)  O  LO 

coo 

•      •  • 

coo 
I  I 


tTi  r-- 
co  cc  cn 

^  CM 

o  o  o 

•     •  • 

coo 


LO  en 

LO  O  CO 

CO  ^  (y-. 

I— I  I— I  CM 


o 
I 


c  o 
I  I 


•a-  o 

1^  CO  CTl 
CTi  C  00 

C  .— I  .-H 

•      •  • 

coo 


^  CM  CO  ^  in 


CTi  o  00  c 
•d-      LO  oc  c 

.—I        CM  CM  LO 

c 
c 

LO 

o 
c 


CTi 
CO 
CC 

c 


in 
c 

o 


CO 

CM 
CM 


LO 

00 


o 

LO 

c 


in 

00 

c 


cr. 
cr. 
o 

• 

o 
I 


I— I  CM  CO  ^  LO 

z:  :^  :z.  2.  2. 


^  O  r-l  CO  C 
LO  CM  LO  O 
CM        CO  CO  LO 

o 
o 

LO 

o 
c 

CM 


88 

with  the  higher  parameter  i.e.,  the  distribution  with  the  shorter 
tail.     A  value  of  e^  >  0.5,  therefore,   implies  that  shorter 
interarrival  times  have  greater  probability  of  occurrence.  The 
parameter  set  considered  is  {e^  =  0.6,  p^^  =  0.9,  P2  =  0.1}.  Depending 
on  selected  values  for  the  first  autocorrelation  coefficient  r^, 
several  sets  of  transition  probabilities  (a^        ^^'^^  obtained  and 
the  corresponding  processes  tested.    Table  6.3  and  Figure  6.1  show  the 
results  of  this  experiment.     It  is  observed  that  the  bias  in  the 
parameters  a^  and  a2  increases  as  a  function  of  r^,  and  that  the  bias 
in  a2  (where  a^  corresponds  to  the  transition  probability  of  the 
longer  tail  geometric  distribution)  is  always  greater  than  the  bias  in 
a^.    In  terms  of  RMSE  it  can  be  concluded  that  method  M4  performs 
better  when  a  strong  dependence  srtucture  is  present  and  method  M5 
when  a  strong  clustering  of  counts  is  present.    Method  M5  seems  to 
give  parameter  estimates  which  are  more  variable  but  less  biased  than 
those  of  method  M4. 

From  the  above  preliminary  analysis,  our  conclusions  regarding 
the  estimators  tested  can  be  summarized  as  follows.    MOM  estimates 
have  a  very  high  likelihood  of  failure  and  should  be  avoided  given  the 
long  tail  distribution  of  the  interarrival  times.    Methods  M4  and  M5 
seem  to  perform  the  best  and  it  is  suggested  that  method  M4  should  be 
used  whenever  feasible,  especially  for  large  sample  sizes  (on  the 
order  of  500  events).    For  smaller  sample  sizes  it  is  suggested  that 
method  M4  is  used  when  the  dependence  of  intervals  is  stronger  than 
the  clustering  of  counts,  and  method  M5  when  the  clustering  is 
stronger  than  the  dependence  in  intervals.     In  view  of  the  above 
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Figure  6.1    Bias  and  variability  of  the  approximate 

maximum  likelihood  estimators  as  functions 
of  the  first  autocorrelation  coefficient. 
Model  is  that  of  Table  6.3. 
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results,  only  methods  M4  and  M5  were  used  for  fitting  the  semi-Markov 

model  to  the  daily  rainfall  occurrences.  These  results  are  given  in 
the  next  chapter. 


CHAPTER  7 

APPLICATION  OF  THE  SEMI-MARKOV  MODEL  TO  DAILY  RAINFALL  OCCURRENCES 


The  Sixth  no  sooner  had  begun 
About  the  beast  to  grope, 

Than,  seizing  on  the  swinging  tail 
That  fell  within  his  scope, 

"I  see,"  quoth  he,  "the  Elephant 
Is  very  like  a  rope!" 


The  analysis  of  the  six  daily  rainfall  records  in  Chapter  4  has 
revealed  that  the  Snoqualmie  Falls  and  Roosevelt  stations  may  be  of 
special  interest.    This  is  due  to  the  tremendous  variability  they 
exhibit  within  a  year,  as  well  as  to  the  non-Poissonian  clustered 
rainfall  occurrence  structures  within  each  of  the  seasons.  In 
addition,   Snoqualmie  Falls   lies   in  a   significantly  different 
climatologic  regime  than  Roosevelt,  and  therefore  these  two  stations 
have  different  underlying   rainfall    generating  mechanisms.  For 
example,  the  mean  interarrival  time  for  Snoqualmie  Falls  ranges  from 
1.34  days  for  December  to  4.26  days  for  July,  while  for  Roosevelt  the 
corresponding  figures  are  4.67  for  July  to  22.52  for  May.    For  these 
reasons,  the  Snoqualmie  Falls  and  Roosevelt  daily  rainfall  sequences 
were  selected  to  demonstrate  the  fitting  of  the  semi-Markov  model. 

7.1    Selection  of  Seasons  and  Seasonal  Statistical  Analysis 
The  objective  of  the  season  discrimination  methodology  is  to 
identify  periods  (seasons)  within  the  year  in  which  the  statistical 
structure  of  the  process  remains  constant,  i.e.,  does  not  vary  over 
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time.    The  statistical  structure  of  the  daily  rainfall  process  is 
completely  characterized  by  the  probability  laws  of  two  properties: 
the  number  of  events  (rainy  days)  within  a  season,  and  the  daily 
rainfall  amounts.    For  the  number  of  events  within  a  season,  the 
relevant  properties  to  be  examined  are  the  probability  distribution 
function  of  the  interarrival  times  and  the  second  order  properties  of 
counts,  specifically  the  spectrum  of  counts,  variance  time  curve  and 
index  of  dispersion.    For  the  non-zero  daily  rainfall  amounts,  the 
property  of  interest  is  the  probability  distribution  function.  All 
these  properties  must  be  examined  in  parallel   for  a  successful 
selection  of  seasons.      It   is   also  understood  that  physical 
considerations  (including  an  understanding  of  the  climatic  conditions 
of  the  region)  and  subjective  judgment  play  an  important  role  in  this 
process. 

For  the  Snoqualmie  Falls  and  Roosevelt  stations,  the  following 
homogeneous  seasons  were  identified  after  careful  examination: 


Snoqualmie  Falls 


Roosevelt 


Season  1 


Jan,  Feb,  Mar 


Jan,  Feb,  Mar,  Apr 


Season  2 


Apr,  May,  Jun 


May,  Jun 


Season  3 


Jul ,  Aug 


Jul 


Season  4 


Sept,  Oct 


Aug 


Season  5 


Nov,  Dec 


Sept,  Oct 


Season  6 


Nov,  Dec 


A  statistical  analysis,  similar  to  that  of  Chapter  4,  was 
performed  on  a  seasonal  basis  and  the  results  are  given  in  Tables 


94 

7.1  -  7.5.    For  the    Roosevelt  station,  30  years  of  data  (1948-1977) 
were  analyzed,  whereas  for  Snoqualmie  Falls  the  analysis  was  performed 
only  on  the  last  15  years  of  the  record  (1963-1977).    The  reason  for 
the  different  record  lengths  is  that  Snoqualmie  Falls  has  a  fairly 
high  rate  of  occurrence,  i.e.,  large  number  of  events  in  each  season, 
which  permits  a  reliable  analysis  for  a  shorter  recording  length,  at  a 
considerable  savings  in  computer  resources. 

Figures  7.1    and  7.2   show   the   properties    of  intervals 
(log-survivor  function)  and  counts  (spectrum  of  counts,  variance  time 
curve  and  index  of  dispersion)  for  the  two  stations.    Comparison  of 
the  seasonal  empirical  curves  with  the  corresponding  curves  for  the 
months  constituting  each  season  (Figures  A.7-A.12  of  Appendix  A) 
revealed  no  significant  differences  confirming  the  selection  of 
seasons. 

7.2    Fitting  the  Semi-Markov  Model  to  the  Daily  Rainfall  Occurrences 
From  the  Monte  Carlo  analysis  in  Chapter  6,  it  was  concluded  that 
the  two  most  consistent  and  efficient  estimation  methods  are  the 
approximate  maximum  likelihood  (ML)  methods  M4  and  M5  (see  Chapter  6 
for  details).    These  methods  will  be  referred  to  in  this  chapter  as 
MLl  and  ML2,  respectively.     It  is  also  recalled  that  although  the 
Bayesian  estimation  method  (ML2)  always  gives  parameter  estimates, 
method  MLl,  based  on  the  first  autocorrelation  coefficient,  gives 
parameter  estimates  if  and  only  if  the  constraint  of  (6.15)  is 
satisfied. 

The  results  of  fitting  the  semi-Markov  model  to  the  seasons  for 
Snoqualmie  Falls  and  Roosevelt  are  shown  in  Table  7.6.  The 
interpretation  of  the  estimated  parameters  for  Snoqualmie  Falls 
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suggest  that  the  interarrival  times  of  the  process  are  sampled  from 
two  geometric  distributions,  one  with  mean  at  the  order  of  1  day 
(p^  -  0.9)  and  the  other  at  the  order  of  2.5  to  7  days  (p^  -  0.4  to 
0.15).    For  this  station,  it  is  also  observed  that  the  transition 
probability  a^  is  always  greater  than  0.5  which  suggests  that  small 
interarrival   times   are  most   likely   to   be   followed   by  small 
interarrival  times,  an  indication  of  clustering.    For  the  Roosevelt 
station,  method  MLl  did  not  give  feasible  parameter  estimates  for  one 
season  (month  of  July),  and  both  methods  gave  a  value  of  p^  at  the 
bound  (p^  =  0.99)  for  three  out  of  six  seasons.    Problems  with  fitting 
the  model  to  this  station  were  expected  given  the  small  number  of 
events  available  for  estimation. 

The  assessment  of  the  goodness  of  fit  of  the  semi-Markov  model 
was  performed  by  comparing  empirical  functions  of  the  data  which  were 
not  used  in  the  estimation  with  their  theoretical  counterparts. 
Figures  7.3  and  7.4  show  these  comparisons  for  some  selected  seasons 
and  stations.    It  is  observed  that  the  theoretical  spectra  of  counts 
are  surprisingly  close  to  the  empirical   ones,   especially  for 
Snoqualmie  Falls.    This  is  a  sign  of  a  good  fit,  given  that  this 
function  was  not  explicitly  used  in  the  estimation.    The  agreements 
for  the  variance  time  curves  generally  is  not  as  good.    This  is  not 
surprising  since  the  estimated  variance  time  curve  is  much  more 
variable  than  the  estimated  spectrum  of  counts.    It  should  be  noted 
that  the  model   does  a  good  job  in  preserving  the  probability 
distribution  of  the  interarrival   times  as  expected,   since  this 
information  is  used  explicitly  in  the  estimation. 
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Table  7.1    Autocorrelation  Coefficients  of  Interarrival  Times-- 
Seasonal  Analysis  {S-  corresponds  to  the  ith  season) 


h 

^2 

h 

^4 

h 

^6 

(a) 

Snoqualmie  Falls 

r^ 

0.047 
-0.026 
0.032 

_n  09? 

—  U  .  \JC.L. 

0.003 

-0.054 
0.074 
0.016 

0.042 

-0.010 
0.057 

-0.022 
0.051 

-0.050 

0.025 
-0.026 

0.032 
-0.053 
-0.042 

0.036 
0.034 
-0.040 
-0.040 
-0.031 

(b)  I 

Roosevelt 

0.009 
0.067 
0.088 
-0.036 
-0.006 

-0.058 
-0.249* 
0.048 
-0.044 
-0.050 

-0.112 
0.073 

-0.008 
0.006 

-0.098 

-0.013 
-0.056 
0.056 
0.073 
-0.073 

-0.051 
0.072 
0.081 
0.008 
0.032 

-0.020 
-0.050 
-0.074 
0.019 
-0.035 

Table  7.2    Statistics  of  the  Interarrival  Times--Seasonal  Analysis 


Number 


Season 

X 

of  Events 

(a)  : 

Snoqualmie  Falls  (15  years) 

1 

1.496 

1.377 

0.920 

4.217 

896 

2 

2.101 

2.603 

1.239 

4.085 

672 

3 

3.715 

5.235 

1.409 

2.924 

246 

4 

2.271 

2.776 

1.222 

3.781 

391 

5 

1.393 

1.125 

0.808 

4.212 

657 

(b)  1 

Roosevelt  (30  years) 

1 

7.821 

13.601 

1.739 

3.437 

502 

2 

18.973 

19.088 

1.006 

0.837 

75 

3 

4.671 

4.759 

1.019 

1.739 

152 

4 

6.052 

10.880 

1.798 

4.345 

191 

5 

8.153 

11.631 

1.427 

2.278 

222 

6 

8.165 

14.565 

1.784 

5.579 

231 
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Table  7.3    Autocorrelation  Coefficients  of  Non-Zero  Daily  Rainfall 
Amounts--Seasonal  Analysis  (S^  corresponds  to  the 
ith  season) 


h 

^2 

h 

^4 

h 

^6 

(a) 

Snoqualmie  Falls 

0.238** 
0.008 
-0.014 
0.011 
0.051 

0.058 
0.016 
-0.07 
-0.026 
-0.010 

0.094 
0.011 
-0.046 
-0.023 
0.012 

0.054 
0.016 
0.057 
0.025 
0.014 

0.130** 

0.023 

0.014 
-0.003 
-0.009 

(b)  Roosevelt 

'I 

0.069 
0.008 
0.029 
-0.043 
-0.034 

0.117 
-0.115 

0.185 
-0.114 
-0.087 

-0.072 
-0.092 
0.042 
0.023 
-0.177* 

0.049 
-0.030 
-0.030 
0.015 
0.074 

0.109 
-0.035 
0.012 
0.057 
-0.025 

0.082 

0.080 

0.212** 

0.011 

0.059 

Table  7.4    Statistics  of  the  Non-Zero  Daily  Rainfall  Amounts--Seasonal 
Analysis 


Season 

X 

(a)    Snoqualmie  Fal 1 s 

(15  years) 

1 

0.373 

0.456 

1.222 

3.019 

2 

0.240 

0.281 

1.170 

2.559 

3 

0.216 

0.274 

1.270 

2.116 

4 

0.311 

0.342 

1.099 

1.963 

5 

0.407 

0.474 

1.164 

2.165 

(b)    Roosevelt  (30  years) 

1 

0.268 

0.312 

1.166 

2.263 

2 

0.217 

0.269 

1.241 

2.099 

3 

0.262 

0.352 

1.343 

2.757 

4 

0.278 

0.367 

1.322 

2.801 

5 

0.345 

0.518 

1.501 

3.240 

6 

0.350 

0.437 

1.248 

2.224 
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Table  7.5    Cross  Correlation  Coefficients  of  the  Non-Zero  Daily 

Rainfall  Amounts  with  Preceding  and  Following  Interarrival 
Times    (X.  =  interarrival  time  following  the  event  P.) 


Season 

(^-2'  Pi) 

(^-1'  Pi) 

(X^.  P^) 

(^.1'  Pi) 

(a) 

Snoqualmie  Falls 

1 

-0.030 

-0  147** 

-0.089** 

2 

-0.069 

-0.087* 

0.068 

0.041 

3 

0.086 

0.079 

-0.206** 

-0.197** 

4 

0.036 

-0.123* 

-0.162** 

-0.062 

5 

0.011 

-0.111* 

-0.091 

-0.065 

(b) 

Roosevelt 

1 

0.003 

-0.043 

-0.093* 

-0.045 

2 

0.013 

0.050 

-0.154 

0.014 

3 

-0.835 

0.019 

-0.014 

-0.051 

4 

-0.035 

-0.019 

-0.007 

0.008 

5 

-0.103 

-0.13 

-0.001 

-0.018 

6 

0.018 

-0.036 

-0.096 

-0.105 
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SEASON  5 


Figure  7.1    Statistical  properties  of  intervals  and  counts  for 
Snoqualmie  Fal 1 s--seasonal  analysis 

I:       Normalized  spectrum  of  counts  vs.  frequency  factor 
II:      Log-survivor  function  vs.  interarrival  time  (days) 
III:    Variance  of  counts  vs.  interval  length  (days) 
IV:      Index  of  dispersion  vs.  interval  length  (days) 
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II 


III 


IV 


SEASON  1 


SEASON  2  .0 
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SEASON  6 


0  lODanKctaosoDworaoaoc 


0    ZD   40   BO   K  100  tzo  i«o  in 


2*0       so  *0C 


160       Z4C       SIC  «00 


Figure  7.2    Statistical  properties  of  intervals  and  counts  for 
Roosevel t--seasonal  analysis 

I:       Normalized  spectrum  of  counts  vs.  frequency  factor 
II:     Log-survivor  function  vs.  interarrival  time  (days) 
III:    Variance  of  counts  vs.  interval  length  (days) 
IV:      Index  of  dispersion  vs.  interval  length  (days) 
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Table  7.6.    Results  of  Fitting  the  Semi-Markov  Model  to  the  Daily 
Rainfall  Occurrences 


Season 

Method 

^1 

Pi 

P2 

^1 

(a)    Snoqualmie  Falls 

1 

MLl 
ML2 

0.776 
0.740 

0.380 
0.279 

0.958 

0.364 

0.735 

2 

MLl 
ML2 

0.599 
0.651 

0.227 
0.326 

0.905 

0.248 

0.659 

3 

MLl 
ML2 

0.534 
0.539 

0.434 
0.430 

0.929 

0.144 

0.549 

4 

MLl 
ML  2 

0.631 
0.601 

0.454 
0.407 

0.916 

0.248 

0.597 

5 

MLl 
ML2 

0.759 
0.407 

0.369 
0.273 

0.971 

0.425 

0.723 

(b) 

Roosevelt 

1 

MLl 
ML2 

0.463 
0.411 

0.568 
0.523 

0.917 

0.075 

0.446 

n 
C 

Ml  1 

nLi 
ML2 

n  n/iQ 

U  .1)43 

0.123 

U  .DDI 

0.680 

n  QQn 

n  263 

3 

MLl 
ML2 

0.271 

0.668 

0.624 

0.165 

0.688 

4 

MLl 
ML2 

0.782 
0.790 

0.183 
0.210 

0.376 

0.053 

0.789 

5 

MLl 
ML2 

0.245 
0.386 

0.556 
0.635 

0.990 

0.080 

0.370 

6 

MLl 
ML2 

0.376 
0.374 

0.551 
0.548 

0.990 

0.075 

0.419 

MLl  =  Approximate  maximum  likelihood  estimates  (MLE)  coupled  with 
the  first  autocorrelation  coefficient 


ML2  =  Approximate  MLE  with  a  Bayesian  approach 
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SEASON  3  SEASON  4 


FREQUENCY  FflCTOR  FREQUENCY  FfCTOR 


SEASON  5 


0       50      ICn     ISO     200     2S0     300     350  4X 
FREQUENCY  FfiCTOR 


Figure  7.3    Comparison  of  empirical  and  theoretical  spectra 

of  counts  for  Snoqualmie  Fall s--seasonal  analysis. 
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SEASON  1 


0       50      100     ISO     200     250     300     350  400 


FREQUENCY  FACTOR 


SEASON  2 


0       50      100     150     200     2S0     300     350  400 


FREQUENCY  FRCTOR 


SEASON  3  SEASON  4 


50      IM     150     200     250     300     350     400  °o       50      100     150     200     250     300     350  400 


FREQUENCY  FBCTOR  FREQUENCY  FBCTOR 


SEASON  5  SEASON  6 


FREQUENCY  FfCTOR  FREQUENCY  FRCTOR 


Figure  7.4    Comparison  of  empirical  and  theoretical  spectra  of 
counts  for  Roosevelt--seasonal  analysis. 
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7.3    Modeling  the  Non-Zero  Daily  Rainfall  Amounts 
Based  on  previous  research  (see  Chapter  2)  and  some  exploratory 
analyses,  the  following  three  marginal  distributions  were  selected  as 
candidates  to  fit  the  daily  rainfall  amounts:    Weibull,  Gamma,  and 
mixed  exponential.    The  properties  and  fitting  procedures  for  these 
distributions  are  discussed  below. 

Weibull  distribution.  The  probability  density  function  (pdf)  of 
the  Weibull  distribution  is: 

„     x-y"-^  x-y" 

f(x)  =          (  )     exp[-(  )  ].  (7.1) 

6-Y    B-Y  S-Y 


where  o,  g,  and  Y  are  parameters  to  be  estimated.    The  mean,  standard 
deviation  and  skewness  coefficient  are  given   in  terms   of  the 
parameters  a,  g,     and  Y  as: 

p  =  Y  +  (B-Y)  r(l+l/ct),  (7.2) 

a  =  (3-Y)  [r(l+2/a)  -  r^d+l/a)]^/^  (7.3) 


r(l+3/a)  -  3r(l+2/a)  r(l+l/a)  +  2r^(l+l/a) 
C.  -  o  TTp  ,  (7.4) 

^  [r(l+2/a)  -  r^(i+i/a)]^/^ 


(see  Kite,  1978),  where  r(0  is  the  usual  gamma  function.    A  method  of 
moments  parameter  estimation  procedure  was  followed.    This  consists  of 
solving  (7.4)  iteratively  for  a,  and  then  solving  (7.2)  and  (7.3)  for 
the  other  two  parameters,  s  and  Y. 
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Gamma  distribution.     The  pdf  of  the  three  parameter  gamma 
distribution  is: 


1        x-Y^"'^  x-Y 

f(x)  =    (  )     exp[-(  )],  (7.5) 

ar(3)      a  a 


where  a,  e,  and  Y  are  parameters  to  be  defined.    The  mean,  standard 
deviation  and  skewness  coefficient  are  given  as 

y  =  a3  +  Y, 

o  =  a/e  ,  (7.6) 

and 

=  2//^  , 

from  which  method  of  moments  estimates  can  be  easily  obtained. 

Mixed  exponential  distribution.    The  pdf  of  a  mixed  exponential 
distribution  is 

f(x)  =  aX^exp[-x^x]  +  (l-a)X2exp[-X2x],  (7.7) 

where  x^   and  x^  are  the  parameters  of  the  two  exponential 
distributions  and  a  is  their  mixing  ratio.    The  mean  and  variance  of 
this  distribution  are 


y  =    + 


a  1-a 

» 


*1        '2  (,.3) 
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P        a        1-a                       1          1  2 
a'^  =  — 2  +  —    +  ot(l-a)  (  )  . 

Everitt  and  Hand  (1981)  suggest  several  methods  of  estimating  the 
parameters  a,  Xp  and  x^.    Here,  the  method  of  maximum  likelihood  was 
used. 

The  log- likelihood  function  of  a  mixed  exponential  distribution 

is 


L  (x)  =  ln[  n  f(x.)] 
i=l  ^ 

(7.9) 

n 

=    E    ln{aX^exp[-X^x.]  +  (l-a)X2exp[-X2X^.] }. 


Estimates  of  the  parameters  were  obtained  using  the  Nelder  and  Mead 
(1965)  simplex  algorithm  for  the  maximization  of  L'(x).    Notice  that 
the  parameter  a  is  a  probability,  and  therefore  should  lie  in  [0,1]. 
The  transformation  (6.5)  was  used  to  constrain  this  parameter. 

All   three  distributions  were  fitted  to  the  non-zero  daily 
rainfall  amounts.     A  visual  comparison  indicated  that  the  mixed 
exponential  gave  the  best  fit,  and  this  distribution  was  subsequently 
used  for  all  seasons.    Figures  7.5  and  7.6  show  the  empirical  and 
fitted  mixed  exponential  cumulative  probability  plots  for  the  fitted 
distributions.    The  parameters  of  the  fitted  distributions  are  given 
in  Table  7.7. 
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Figure  7.5  Empirical  and  theoretical  cumulative  probability 
functions  of  the  non-zero  daily  rainfall  amounts 
for  Snoqualmie  Fal 1 s--seasonal  analysis. 
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SEASON  3  SEASON  4 


Figure  7.6  Empirical  and  theoretical  cumulative  probability 
functions  of  the  non-zero  daily  rainfall  amounts 
for  Roosevel t--seasonal  analysis. 
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Table  7.7    Parameters  of  the  Mixed  Exponential  Distribution  Fitted  to 
the  Non-Zero  Daily  Rainfall  Amounts 


Season 

a 

^1 

X2 

(a)  Snoqualmie  Falls 

1 

0.182 

17.627 

2.257 

2 

0.201 

17.033 

3.504 

3 

0.412 

17.500 

3.065 

4 

0.120 

26.743 

2.855 

5 

0.152 

19.654 

2.123 

(b)  Roosevelt 

1 

0.251 

16.347 

2.963 

2 

0.558 

9.631 

2.797 

3 

0.514 

9.527 

2.317 

4 

0.387 

11.626 

2.514 

5 

0.561 

8.406 

1.577 

6 

0.276 

17.303 

2.163 

no 


7.4    Coupling  the  Models  for  Occurrences  and  Amounts  and  Overall  Model 

Performance 

The  cross  correlation  analysis  of  interarrival  times  and  non-zero 
rainfall    amounts    (Table  7.5)    indicated   that   no  significant 
correlations  were  present  for  the  Roosevelt  station  but  that  small, 
although  significant,  correlations  were  present  for  Snoqualmie  Falls. 
The  significance  of  the  Snoqualmie  Falls  correlations  may  be  due  in 
part  to  the  greater  number  of  events  at  that  station.    If  the  small 
correlations  are  taken  to  justify  an  assumption  of  independence,  this 
implies  that  given  the  occurrence  of  an  event,  the  corresponding  daily 
rainfall  amount  does  not  depend  on  whether  or  not  the  event  was  the 
first  or  last  rainy  day  in  a  sequence  of  rainy  days.    If  independence 
is  assumed,  the  coupling  of  the  rainfall  occurrence  model  with  the 
rainfall  amounts  model  becomes  easy,  since  the  two  processes  are 
simply  superimposed.     For  example,  a  generation  scheme  for  daily 
rainfall  sequences,  would  consist  of  generating  the  position  of  daily 
rainfall  occurrences  from  a  semi-Markov  model,  and  then  assigning  to 
each  rainy  day  a   rainfall    amount   from  the  desired  marginal 
distribution. 

For  the  purposes  of  streamflow  prediction,  or  other  applications 
where  a  mass  balance  is  desired,  one  is  interested  in  the  distribution 
of  the  total   rainfall  over  the  next  t  days.     For  example,  for 
rainfal 1 /runoff  studies,  an  important  property  of  a  daily  rainfall 
generation  scheme  is  its  ability  to  preserve  the  total  rainfall 
amounts  over  periods  of  given  length,  i.e.,  one  week  or  one  month. 
The  statistical  properties  of  the  accumulated  rainfall  process  are 
given  below. 


m 


Let  R(t)  denote  the  accumulated  rainfall  process  over  a  period  of 
length  t.  Then, 


R(t)  =        Y.,  (7.10) 
i=l  ^ 


where  {Y^ }  is  the  process  of  the  non-zero  daily  rainfall  amounts  and 
{N^}  is  the  daily  rainfall  occurrence  process.    Making  the  assumption 
that  the  non-zero  daily  rainfall  amounts  {Y.},  are  independent  and 
identically  distributed,  and  that  the  daily  rainfall  occurrence 
process  {N^}  is  independent  of  the  rainfall  amounts  process  {Y^-}  ,  the 
mean  and  variance  of  R(t)  are  given  as 


E[R(t)]  =  u^mt  (7.11) 


and 


Var[R(t)]  =  a^mt  +  y/v(t),  (7.12) 


where      =  E[Y.],        =  Var(Y.),  V(t)  is  the  variance  time  curve  of 

the  counting  process  {     },  and  m  is  its  rate  of  occurrence.    For  a 

semi-Markov  model,  m  and  V(t)  are  given  in  terms  of  the  parameters  a^ , 

a^,  p-^ ,  and  [i^,  from  equations  (5.29)  and  (5.34)  of  Chapter  5.    For  a 

2 

mixed  exponential  distribution,  and  are  given  in  terms  of  the 
parameters  a,  X^,  (7.8). 

Table  7.8  shows  the  empirical   seasonal  means  and  standard 
deviations  together  with  their  theoretical  counterparts  for  the  fitted 
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Table  7.8    Comparison  of  the  Empirical  and  Theoretical  Seasonal  Means 
and  Standard  Deviations 


Mean  Standard  Deviation 


Season 

Empirical 

Theoretical 

Empi  rical 

Theoretical 

(a)    Snoqualmie  Falls 

1 

22.145 

22.426 

5.821 

5.288 

2 

10.621 

10.255 

2.643 

2.513 

3 

3.352 

3.468 

1.587 

1.484 

4 

8.689 

8.238 

2.749 

2.699 

5 

17.789 

17.506 

4.445 

4.729 

(b)  Roosevelt 

1 

4.481 

4.086 

2.326 

1.688 

2 

0.542 

0.676 

0.649 

0.629 

3 

1.341 

1.767 

0.841 

1.156 

4 

1.768 

1.369 

1.654 

0.964 

5 

2.570 

2.511 

2.188 

1.644 

6 

2.698 

2.573 

2.620 

1.519 
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model.     The  preservation  of  these  seasonal   statistics  are  very 
satisfactory  for  Snoqualmie  Falls,  whereas  for  the  station  of 
Roosevelt  the  results  are  not  as  good.    This  is  not  surprising  given 
the  small  number  of  events  available  for  the  estimation  of  the  model 
parameters. 


CHAPTER  8 
SUMMARY  AND  CONCLUSIONS 


And  so  these  men  of  Indostan 
Disputed  loud  and  long, 

Each  in  his  own  opinion 

Exceeding  still  and  strong. 

Though  each  was  partly  in  the  right 
And  all  were  in  the  wrong  ! 

John  Godfrey  Saxe  (1816-1887) 
Reprinted  in  Engineering  Concepts 
Curriculum  Project  (1971) 

Several  authors  have  recently  had  apparent  success  in  applying 
continuous-time  point  process  models  to  daily  rainfall  observation 
sequences.    In  this  work  we  have  shown  that  major  problems  arise  when 
the  observation  sequence  represents  cumulative  rainfall  amounts  over  a 
period  (e.g.,  one  day)  which  is  on  the  order  of  the  process 
interarrival  time.    In  particular,  the  use  of  continuous-time  point 
process  models  for  daily  rainfall  occurrences  may  result  in  incorrect 
inferences  about  the  underlying  rainfall  generating  mechanisms.  Since 
daily  rainfall  occurrences  form  a  discrete  point  process,  it  seems 
only  natural  that  daily  rainfall  sequences  should  be  compared  with  the 
discrete  independent  Bernoulli,  and  not  with  the  continuous  Poisson 
process.    Daily  rainfall  structures  that  are  underdispersed  (more 
regular  occurrences)  relative  to  the  independent  Poisson  process  may 
in  fact  be  overdi spersed  (more  random  occurrences)  relative  to  the 
Bernoulli  process. 
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The  statistical  analysis  of  six  daily  rainfall  records  from 
diverse  climatologic  regimes  throughout  the  U.S.  (Snoqualmie  Falls, 
Washington;   Roosevelt,  Arizona;   Austin,   Texas;   Miami,  Florida; 
Philadelphia,  Pennsylvania;  and  Denver,  Colorado)  has  confirmed  the 
inappropriateness  of  the  continuous  point  process  modeling  approach 
for  the  daily  rainfall  occurrence  process.     The  daily  rainfall 
occurrences  for  some  months  and  some  stations  are  underdispersed 
relative  to  Poisson,  a  condition  that  is  inconsistent  with  the 
continuous  point  process  models  used  by  other  authors.  However, 
comparison  of  the  statistics  of  the  rainfall  occurrence  processes  at 
these  stations  with  the  Bernoulli  indicated  that  all  were  clustered, 
that  is,  overdispersed,  which  is  consistent  with  the  underlying 
physical  processes.     A  further  disadvantage  of  continuous  point 
process  models  is  that  they  cannot  be  used  for  the  generation  of 
synthetic  rainfall  sequences.     It  has  been  shown  that,  using  these 
models,  generation  of  synthetic  rainfall  sequences  leads  to  serious 
upward  biases  in  the  event  interarrival  times  and  in  dependence 
structures  which  may  be  much  different  than  those  of  the  apparent 
generating  process. 

To  meet  these  shortcomings  of  continuous  point  process  models,  a 
discrete  point  process  model  has  been  developed  and  its  structural 
properties  derived.       The  model  belongs  to  the  class  of  semi -Markov 
(or  Markov  renewal)  processes  and  has  a  flexible  structure.    In  the 
semi-Markov  model  the  sequence  of  times  between  events  is  formed 
through  sampling  from  two  geometric  distributions,  according  to 
transition  probabilities  specified  by  a  Markov  chain.    In  that  sense, 
higher  probabilities  of  transition  from  the  geometric  distribution 
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with  the  smaller  mean  to  the  same  geometric  distribution,  rather  than 
to  the  one  with  the  larger  mean,  incorporates  a  clustering  structure 
in  the  process. 

Several  methods  for  fitting  the  proposed  model  have  been  studied. 
Due  to  the  heavy-tailed  distributions  of  the  interarrival  times, 
method  of  moment  estimates  do  not  perform  well.     An  approximate 
likelihood  method  which  estimates  the  equilibrium  probabilities  of  the 
Markov   chain   of    intervals,    and    subsequently   the  transition 
probabilities,  has  been  proposed.    This  approximate  maximum  likelihood 
approach  was  found  to  perform  adequately,  especially  for  daily 
rainfall  structures  with  small  autocorrelations  in  the  sequence  of 
interarrival  times. 

The  semi-Markov  model  was  fitted  to  the  daily  rainfall 
occurrences  of  the  Snoqualmie  Falls  and  Roosevelt  stations,  both  on  a 
monthly  and  seasonal  basis.    Seasons  were  selected  after  a  careful 
examination  of  all  the  statistical  properties  of  intervals  and  counts. 
The  fit  of  the  model  was  assessed  by  the  preservation  of  selected 
statistical  properties  of  the  series  which  were  not  used  directly  in 
the  estimation.    It  was  shown  that  the  fitted  model  gave  a  theoretical 
spectrum  of  counts  surprisingly  close  to  the  empirical  one.  The 
semi-Markov  model  of  daily  rainfall  occurrences  coupled  with  a  mixed 
exponential  distribution  for  the  non-zero  daily  rainfall  amounts 
preserved  the  seasonal   means   and   standard  deviations   for  the 
Snoqualmie  Falls  station,  but  not  for  the  Roosevelt  station.  The 
preservation  of  cumulative  rainfall  amounts  over  longer  periods  (e.g., 
weeks  or  months)  is  an  important  property  of  a  daily  rainfall 
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generation  model,   especially   when   the   model    is    used  for 
rainfall /runoff  studies. 

The  proposed  use  of  discrete- time  point  process  models  (including 
the  semi-Markov  approach)  for  daily  rainfall  occurrences  opens  a 
number  of  areas  for  future  research.    Among  these  are  the  following: 

(1)  The  possible  use  of  alternate  discrete  point  process  model 
structures  for  daily  rainfall.     For  example,  it  seems  feasible  to 
derive  discrete  point  process  models  with  structures  similar  to  the 
two-level  hierarchical  structures  of  the  continuous  Poisson  cluster 
models,  i.e.,  a  discrete  analogue  of  the  Neyman-Scott  model. 

(2)  Improved  fitting  techniques  for  discrete  point  process  models. 
Although  continuous  point  process  models  have  been  extensively  studied 
statistically,  not  much  work  has   been  done  on  discrete  point 
processes.     Specifically,  for  daily  rainfall,  alternate  fitting 
methods  that  explicitly  preserve  the  monthly  or  seasonal  rainfall 
statistics  might  be  investigated.     This  would  probably  require  an 
iterative  estimation  scheme  to  accommodate  the  trade-off  between  exact 
preservation  of  short  and  long  term  statistics. 

(3)  Application  of  the  semi-Markov  model  to  shorter  time  increment 
rainfall  sequences,  such  as  hourly.  In  particular,  the  compatibility 
of  the  semi-Markov  model  with  continuous  point  process  models  applied 
to  the  unobserved  continuous  generalized  stochastic  process,  ^(t)  (see 
figure  1.1),  could  be  investigated.    One  interesting  question,  related 
to  ongoing  work  on  process  scale  effects,  is  to  determine  whether  the 
statistics  for  daily  rainfall  derived  using  a  continuous  Neyman-Scott 
model  for  5(t)  are  in  agreement  with  the  statistics  obtained  by 
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modeling  the  daily  rainfall  sequences  directly  with  a  semi-Markov 
model . 

(4)  Improved  methods  for  coupling  rainfall  occurrence  models  with 
rainfall  amounts  models.  An  area  deserving  further  attention  is  the 
development  of  a  model  structure  that  accounts  for  cross-correlation 
between  the  occurrence  and  amounts  processes. 

(5)  Extension  of  discrete  point   process  models  to  multiple 
dimensions.     This  is  essential  generalization  for  rainfall -runoff 
studies  and  for  the  estimation  of  missing  data  in  rainfall  sequences. 


APPENDIX  A 


STATISTICAL  PROPERTIES  OF  THE  SIX  DAILY  RAINFALL 
RECORDS  ANALYZED 
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Figure  A.l    Number  of  rainy  days  per  year  and  total  annual 

rainfall  amounts  for  Snoqualmie  Falls,  Washington. 
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Figure  A. 2    Number  of  rainy  days  per  year  and  total  annual 
rainfall  amounts  for  Roosevelt,  Arizona. 


Figure  A. 3    Number  of  rainy  days  per  year  and  total  annual 
rainfall  amounts  for  Austin,  Texas. 
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Figure  A. 4 


Number  of  rainy  days  per  year  and  total  annual 
rainfall  amounts  for  Miami,  Florida. 
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Figure  A. 5    Number  of  rainy  days  per  year  and  total  annual 
rainfall  amounts  for  Philadelphia,  Pennsylvania. 
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Figure  A. 6    Number  of  rainy  days  per  year  and  total 
rainfall  amounts  for  Denver,  Colorado. 
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Table  A. 2    Statistics  of  Interarrival  Times--Monthly  Analysis 


Number 

Month 

X 

of  events 

(a)    Snoqualmie  Falls 

Jan 

1.393 

1.171 

0.841 

3.825 

667 

Feb 

1.499 

1.328 

0.886 

3.933 

557 

Mar 

1.542 

1.548 

1.004 

5.542 

607 

Apr 

1.717 

1.588 

0.925 

3.060 

530 

May 

2.214 

2.501 

1.129 

3.190 

429 

Jun 

2.693 

4.354 

1.617 

4.941 

375 

Jul 

4.259 

6.080 

1.428 

2.912 

205 

Aug 

3.323 

4.226 

1.272 

2.259 

260 

Sep 

2.708 

3.764 

1.390 

4.266 

332 

Oct 

1.752 

1.668 

0.952 

3.084 

499 

Nov 

1.442 

1.231 

0.854 

4.217 

613 

Dec 

1.343 

0.978 

0.728 

4.167 

694 

(b)  Roosevelt 

Jan 

6.241 

9.059 

1.452 

2.500 

145 

Feb 

6.224 

10.174 

1.635 

2.978 

125 

Mar 

7.604 

13.184 

1.734 

3.366 

139 

Apr 

12.693 

21.596 

1.701 

2.399 

88 

May 

22.524 

21.968 

0.975 

0.601 

42 

Jun 

14.455 

13.661 

0.945 

0.480 

33 

Jul 

4.671 

4.759 

1.019 

1.739 

152 

Aug 

6.052 

10.880 

1,798 

4.345 

191 

Sep 

7.319 

9.675 

1.322 

2.533 

116 

Oct 

9.066 

13.439 

1.482 

1.961 

106 

Nov 

8.216 

11.108 

1.352 

2.138 

97 

Dec 

7.761 

16.275 

2.097 

6.345 

134 

(c) 

Austin 

Jan 

3.513 

3.973 

1.313 

2.017 

240 

Feb 

3.881 

5.315 

1.369 

4.277 

227 

Mar 

4.480 

4.484 

1.001 

1.585 

200 

Apr 

3.808 

4.048 

1.063 

2.329 

224 

May 

4.232 

5.258 

1.242 

2.656 

241 

Jun 

5.197 

6.972 

1.342 

2.038 

178 

Jul 

6.553 

8.724 

1.331 

2.466 

141 

Aug 

5.278 

6.531 

1.237 

2.251 

162 

Sep 

4.613 

6.471 

1.403 

3.195 

212 

Oct 

4.654 

5.988 

1.287 

3.254 

188 

Nov 

4.615 

6.059 

1.313 

3.613 

192 

Dec 

4.359 

5.924 

1.359 

3.069 

209 

129 


Table  A. 2  (Continued) 


Month 

X 

Number 

c      of  events 
s 

(d)  Miami 

Jan 

4.640 

4.891 

Feb 

6.160 

6.372 

Mar 

5.802 

6.199 

Apr 

4.989 

5.751 

May 

2.261 

2.615 

Jun 

1.989 

1.974 

Jul 

2.157 

1.919 

Aug 

1.829 

1.472 

Sep 

1.840 

1.625 

Oct 

2.650 

3.068 

Nov 

4.741 

5.135 

Dec 

4.913 

5.521 

1.054 

2.243 

197 

1.034 

1.631 

131 

1.068 

1.866 

162 

1.153 

1.958 

174 

1.157 

3.333 

375 

0.993 

3.248 

444 

0.889 

2.476 

439 

0.805 

2.396 

502 

0.884 

2.642 

486 

1.158 

3.244 

366 

1.083 

1.994 

197 

1.124 

2.158 

195 

(e)  Philadelphia 


Jan 

2.711 

2.517 

Feb 

2.978 

2.555 

Mar 

2.854 

2.521 

Apr 

2.888 

2.782 

May 

2.879 

2.930 

Jun 

2.993 

2.680 

Jul 

3.395 

2.843 

Aug 

3.502 

3.845 

Sep 

3.737 

3.867 

Oct 

4.221 

4.486 

Nov 

3.251 

3.245 

Dec 

2.807 

2.503 

0.928 

2.107 

332 

0.858 

1.865 

278 

0.883 

2.118 

323 

0.963 

2.275 

322 

1.018 

2.726 

323 

0.895 

2.076 

297 

0.837 

1.391 

266 

1.098 

2.638 

273 

1.035 

2.210 

243 

1.063 

2.130 

213 

0.998 

2.239 

287 

0.892 

1.706 

316 

(f)  Denver 


Jan 

5.458 

6.015 

Feb 

4.421 

4.794 

Mar 

3.668 

4.513 

Apr 

3.552 

4.078 

May 

2.965 

3.071 

Jun 

3.483 

4.047 

Jul 

3.114 

3.132 

Aug 

3.836 

4.035 

Sep 

4.872 

6.304 

Oct 

5.993 

7.049 

Nov 

5.813 

6.121 

Dec 

5.560 

7.067 

1.102 

1.777 

168 

1.084 

2.502 

164 

1.230 

2.950 

244 

1.148 

2.130 

259 

1.036 

2.006 

283 

1.162 

2.752 

261 

1.006 

2.002 

280 

1.052 

2.447 

244 

1.294 

2.977 

187 

1.176 

1.861 

141 

1.053 

1.633 

155 

1.271 

2.439 

150 

130 
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Table  ^A    Statistics  of  Non-Zero  Daily  Rainfall  Amounts-- 
Monthly  Analysis 


Month 

X 

(a)  Snoqualmie  Falls 

Jan 

0.415 

0.462 

1.112 

2.060 

Feb 

0.360 

0.449 

1.247 

3.196 

Mar 

0.306 

0.371 

1.212 

3.914 

Apr 

0.253 

0.264 

1.041 

1.695 

May 

0.228 

0.267 

1.174 

2.640 

Jun 

0.231 

0.300 

1.301 

2.641 

Jul 

0.217 

0.281 

1.296 

2.099 

Aug 

0.213 

0.273 

1.282 

2.417 

Sep 

0.227 

0.335 

1.208 

1.922 

Oct 

0.337 

0.362 

1.073 

1.789 

Nov 

0.408 

0.459 

1.125 

1.940 

Dec 

0.408 

0.480 

1.177 

2.575 

(b)  Roosevelt 

Jan 

0.308 

0.347 

1.125 

1.622 

Feb 

0.239 

0.270 

1.130 

1.941 

Mar 

0.305 

0.353 

1.158 

2.632 

Apr 

0.185 

0.215 

1.162 

1.930 

May 

0.169 

0.205 

1.217 

2.157 

Jun 

0.278 

0.327 

1.174 

1.658 

Jul 

0.262 

0.352 

1.343 

2.757 

Aug 

0.278 

0.367 

1.322 

2.801 

Sep 

0.319 

0.451 

1.412 

2.565 

Oct 

0.366 

0.577 

1.578 

3.521 

Nov 

0.289 

0.379 

1.314 

2.627 

Dec 

0.395 

0.471 

1.192 

1.973 

(c)  Austin 

Jan 

0.204 

0.349 

1.715 

4.647 

Feb 

0.333 

0.535 

1.606 

2.604 

Mar 

0.230 

0.326 

1.415 

2.658 

Apr 

0.457 

0.623 

1.365 

2.084 

May 

0.485 

0.683 

1.409 

2.539 

Jun 

0.530 

0.738 

1.392 

2.261 

Jul 

0.352 

0.607 

1.727 

3.742 

Aug 

0.414 

0.621 

1.502 

3.003 

Sep 

0.502 

0.725 

1.445 

2.612 

Oct 

0.558 

0.821 

1.471 

2.721 

Nov 

0.301 

0.537 

1.781 

4.241 

Dec 

0.287 

0.486 

1.694 

3.449 
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Table  A. 4  (Continued) 


Month 

X 

(d)  Miami 

Jan 

0.316 

0.411 

1.302 

1.994 

Feb 

0.373 

0.588 

1.578 

3.527 

Mar 

0.359 

0.686 

1.910 

6.316 

Apr 

0.585 

1.161 

1.986 

6.151 

May 

0.584 

0.884 

1.512 

3.195 

Jun 

0.562 

0.778 

1.385 

2.745 

Jul 

0.367 

0.478 

1.300 

2.592 

Aug 

0.424 

0.637 

1.502 

3.501 

Sep 

0.469 

0.667 

1.443 

3.041 

Oct 

0.482 

0.828 

1.719 

3.917 

Nov 

0.372 

0.831 

2.235 

4.400 

Dec 

0.280 

0.411 

1.471 

2.263 

(e)  Philadelphia 

Jan 

0.265 

0.334 

1.260 

2.416 

Feb 

0.300 

0.338 

1.128 

1.758 

Mar 

0.349 

0.393 

1.125 

1.861 

Apr 

0.317 

0.388 

1.227 

2.169 

May 

0.307 

0.387 

1.259 

2.095 

Jun 

0.387 

0.600 

1.551 

2.985 

Jul 

0.423 

0.587 

1.388 

2.293 

Aug 

0.449 

0.618 

1.337 

2.707 

Sep 

0.421 

0.650 

1.545 

3.410 

Oct 

0.381 

0.512 

1.345 

2.488 

Nov 

0.355 

0.525 

1.479 

3.309 

Dec 

0.335 

0.394 

1.176 

1.604 

(f)  Denver 

Jan 

0.093 

0.137 

1.480 

3.117 

Feb 

0.120 

0.151 

1.257 

2.471 

Mar 

0.145 

0.182 

1.253 

2.904 

Apr 

0.205 

0.369 

1.805 

4.948 

Mar 

0.249 

0.435 

1.748 

3.526 

Jun 

0.194 

0.365 

1.878 

4.499 

Jul 

0.198 

0.304 

1.534 

2.679 

Aug 

0.161 

0.288 

1.788 

3.502 

Sep 

0.202 

0.283 

1.402 

2.358 

Oct 

0.198 

0.258 

1.301 

2.518 

Nov 

0.146 

0.157 

1.075 

1.455 

Dec 

0.102 

0.141 

1.386 

4.267 
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Table  A. 5    Cross  Correlation  Coefficients  of  the  Non-Zero  Daily 

Rainfall  Amounts  with  Preceding  and  Following  Interarrival 
Times  (X.  =  interarrival  time  following  the  event  P.) 


(a)    Snoqualmie  Falls 


Jan 

-0.036 

-0.080* 

-0.089* 

-0.041 

Feb 

0.033 

-0.120** 

-0.129** 

-0.047 

Mar 

-0.063 

-0.022 

-0.011 

-0.065 

Apr 

-0.068 

-0.035 

0.006 

0.032 

May 

-0.019 

-0.024 

-0.117 

-0.124* 

Jun 

-0.055 

-0.037 

-0.007 

-0.051 

Jul 

-0.029 

-0.037 

-0.198** 

-0.118 

Aug 

-0.001 

-0.099 

-0.132* 

-0.168** 

Sep 

0.005 

-0.086 

-0.117* 

-0.095 

Oct 

0.035 

-0.064 

-0.092* 

-0.149** 

Nov 

0.033 

0.030 

-0.111** 

-0.097** 

Dec 

-0.028 

-0.134** 

-0.132** 

0.013 

(b)  Roosevelt 

Jan 

-0.046 

-0.084 

-0.112 

-0.103 

Feb 

-0.056 

-0.010 

0.059 

0.102 

Mar 

0.247** 

0.026 

-0.058 

-0.188* 

Apr 

0.049 

0.194 

-0.166 

0.044 

May 

0.164 

-0.067 

-0.201 

-0.065 

Jun 

-0.102 

-0.036 

-0.011 

0.248 

Jul 

-0.085 

0.019 

-0.014 

-0.052 

Aug 

-0.035 

-0.019 

-0.007 

-0.021 

Sep 

-0.025 

-0.075 

0.064 

-0.006 

Oct 

-0.029 

0.004 

-0.157 

-0.167 

Nov 

-0.066 

0.013 

0.039 

0.048 

Dec 

-0.091 

-0.007 

-0.121 

-0.130 

(c)  Austin 

Jan 

-0.021 

0.070 

-0.023 

0.013 

Feb 

0.044 

0.037 

0.128 

-0.016 

Mar 

0.165* 

-0.001 

-0.104 

0.007 

Apr 

-0.016 

-0.084 

-0.070 

-0.008 

May 

0.025 

-0.039 

0.054 

0.017 

Jun 

-0.048 

0.125 

0.057 

0.168* 

Jul 

-0.055 

-0.031 

-0.063 

0.060 

Aug 

0.016 

0.044 

-0.085 

-0.109 

Sep 

0.033 

-0.006 

0.041 

0.105 

Oct 

-0.005 

0.029 

0.007 

0.061 

Nov 

0.009 

-0.027 

-0.021 

-0.089 

Dec 

0.068 

0.276** 

-0.046 

-0.009 
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Table  A. 5  (Continued) 


(Xj-z'Pj)  ^^-'^^ 

(d)  Miami 


Jan 

0.017 

0.159 

0.061 

-0.115 

Feb 

-0.048 

-0.088 

-0.033 

-0.054 

Mar 

0.078 

-0.008 

-0.078 

0.035 

Apr 

-0.041 

-0.137 

0.035 

-0.143 

May 

-0.050 

-0.081 

-0.080 

-0.033 

Jun 

-0.014 

0.008 

-0.019 

-0.011 

Jul 

-0.029 

-0.038 

-0.029 

-0.014 

Aug 

0.018 

-0.071 

-0.016 

-0.004 

Sep 

0.037 

-0.016 

-0.080 

-0.001 

Oct 

-0.052 

0.033 

0.043 

0.033 

Nov 

0.020 

-0.061 

-0.126 

-0.114 

Dec 

0.077 

-0.094 

-0.029 

-0.047 

(e)  Philadelphia 


Jan  0.073  0.019  0.057  0.139* 

Feb  0.039  -0.087  0.008  0.045 

Mar  0.076  0.062  0.079  0.008 

Apr  0.044  0.047  -0.061  -0.034 

May  -0.094  -0.061  -0.015  0.001 

Jun  -0.029  0.130*  0.049  -0.048 

Jul  -0.013  -0.008  -0.010  -0.028 

Aug  0.095  0.142*  -0.075  -0.087 

Sep  0.000  0.059  0.120  -0.035 

Oct  0.101  0.050  0.064  0.143* 

Nov  -0.026  -0.051  0.055  -0.003 

Dec  0.112*  -0.045  -0.051  -0.030 

(f)  Denver 


Jan 

-0.048 

-0.020 

-0.112 

0.000 

Feb 

-0.003 

-0.056 

0.050 

-0.054 

Mar 

-0.010 

0.009 

0.045 

0.015 

Apr 

-0.049 

0.204** 

-0.056 

0.004 

May 

-0.052 

0.019 

0.071 

-0.104 

Jun 

0.093 

-0.010 

-0.076 

-0.060 

Jul 

-0.045 

-0.035 

-0.040 

-0.002 

Aug 

-0.050 

0.077 

-0.007 

-0.050 

Sep 

0.046 

-0.031 

-0.004 

-0.049 

Oct 

0.035 

0.092 

-0.068 

-0.103 

Nov 

0.127 

-0.017 

0.145 

0.065 

Dec 

0.163* 

0.017 

0.051 

-0.052 
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I  II  III  IV 


Figure  A. 7    Statistical  properties  of  intervals  and  counts  for 
Snoqualmie  Fal ls--monthly  analysis. 

I:       Normalized  spectrum  of  counts  vs.  frequency  factor 
II:     Log-survivor  function  vs.  interarrival  time  (days) 
III:    Variance  of  counts  vs.  interval  length  (days) 
IV:      Index  of  dispersion  vs.  interval  length  (days) 
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I  II  III  IV 


Figure  A. 7  (continued) 
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Figure  A. 8    Statistical  properties  of  intervals  and  counts  for 
Roosevelt--monthly  analysis 

I:       Normalized  spectrum  of  counts  vs.  frequency  factor 
li:     Log-survivor  function  vs.  interarrival  time  (days) 
III:    Variance  of  counts  vs.  interval  length  (days) 
IV:      Index  of  dispersion  vs.  interval  length  (days) 
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Figure  A. 8  (continued) 
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Figure  A. 9    Statistical  properties  of  intervals  and  counts  for 
Austin--monthly  analysis 

I:       Normalized  spectrum  of  counts  vs.  frequency  factor 
II:     Log-survivor  function  vs.  interarrival  time  (days) 
III:    Variance  of  counts  vs.  interval  length  (days) 
IV:      Index  of  dispersion  vs.  interval  length  (days) 
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Figure  A. 9  (continued) 
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I  II  III  IV 


Figure  A. 10    Statistical  properties  of  intervals  and  counts  for 
Miami--monthly  analysis 

I:       Normalized  spectrum  of  counts  vs.  frequency  factor 
II:      Log-survivor  function  vs.  interarrival  time  (days) 
III:    Variance  of  counts  vs.  interval  length  (days) 
IV:      Index  of  dispersion  vs.  interval  length  (days) 
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I  II  III  IV 


Figure  A. 10  (continued) 
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Figure  A. 11    Statistical  properties  of  intervals  and  counts  for 
Philadelphia--monthly  analysis 

I:       Normalized  spectrum  of  counts  vs.  frequency  factor 
II:     Log-survivor  function  vs.  interarrival  time  (days) 
III:    Variance  of  counts  vs.  interval  length  (days) 
IV:      Index  of  dispersion  vs.  interval  length  (days) 
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Figure  A. 11  (continued) 
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Figure  A.  12    Statistical  properties  of  intervals  and  counts  for 
Denver--monthly  analysis 

I:       Normalized  spectrum  of  counts  vs.  frequency  factor 
II:     Log-survivor  function  vs.  interarrival  time  (days) 
III:    Variance  of  counts  vs.  interval  length  (days) 
IV:      Index  of  dispersion  vs.  interval  length  (days) 


APPENDIX  B 


DETAILS  ON  THE  DERIVATION  OF  THE  CONDITIONAL  INTENSITY  FUNCTION 

OF  THE  SEMI-MARKOV  PROCESS 


The  Laplace  transform  of  the  conditional  intensity  function  of  a 
general  two-state  semi-Markov  process  is  given  as 


(1-aJf  *(s)  +  (1-aJf  *(s)  +  (l-a,-aj(2-a,-ajf,*(s)f  *(s) 

h*{s)  =  ^-J  ^-^  ^-^  ^  

(2-a^-a2)  l-a^fi*(s)  -  a2f2*(s)  -  (l-a^-a2)fi*{s)f2*(s) 

(B.l) 


where  f^*(s),  i=l,2,  are  the  Laplace  transforms  of  the  two  probability 
density  functions  of  the  interarriva!  times  (see  Cox  and  Lewis,  1978). 
A  geometric  distribution  with  parameter  p,  can  be  written  in  the 
continuous-function  form 


f(t)  =    I  p(l-p)''"^6(t-k)    ,  (B.2) 

k=l 

where  is  the  Dirac  function.  The  Laplace  transform  of  f(t)  can 
be  easily  shown  to  be 

f*(s)  =   r    ,  (B.3) 

1  -  (I-P)e"' 
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where  we  have  made  use  of  the  fact  that  ^(5(t-k))  =  e"*^^.  Expressions 
for  f^*(s)  and  f2*(s),  analogous  to  (B.3),  then  substituted  in  (B.l) 


to  give 


(2-a^-a2)h*(s)e^ 


p^il-a^)  +  P2(l-3i)  +  [PiP2(2-a^-a2)^  -  Pi(l-a2)  -  P2(l-a^)]e"^ 
1  +  [pj(l-a^)  +  P2(l-a2)  -  2]e"^  +  [1  -  p^d-a^)  -  P2(l-a2)]e"^^ 


6     [p^P2(2-a^-a2)^  -  Sle''       ^  ^^^^^ 
1  +  (e-2)e"^  +  (l-6)e"^^ 


where 


5  =  p^(l-a2)  +  P2(l-ai) 
and  (B.5) 
B  =  p^(l-a^)  +  P2(l-a2) 


To  obtain  h(t)  the  inverse  Laplace  transform  of  (8.4)  is  needed.  The 
polynomial  in  the  denominator  of  (B.4)  has  two  real  roots,  one  equal 
to  1  and  the  other  equal  to  (1-6),  and  therefore  (B.4)  can  be  written 
as 

h*(s)e'=— ^  +   ^   .  (B.6) 

1-e"^     1  -  (l-B)e"^ 


where  G  and  A  are 
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A  =   ^  G    .  (B.8) 


By  taking  the  inverse  Laplace  transform  of  (B.6) 


5^"kh*(s)e^]  =    E  [G  +  A(l-e)'']6(t-k)      ,  (B.9) 
k=0 


h(t)  takes  the  form 


h(t)  =    E  [6  +  A{l-e)'^]6(t-k)      .  (B.IO) 
k=l 


Comparing  (B.IO)  with  the  discrete-analogue  expression  of  the 
conditional  intensity  function,  that  is 


h(t)  =    Z  h.  5(t-k) 
k=l  ^ 


we  can  write 


h,^  =  G  +  Ad-B)*^      .  (B.ll) 
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Note  that  G  of  (B.7)  reduces  to  the  constant  intensity,  m,  of  the 
semi-Markov  model.    Therefore,  (B.ll)  takes  the  form 


where. 


and 


h^  =  m  +  AW'^  (B.12) 


A  =  e^p^  +  Q^Pz  '  ^  (B.13) 


W  =  1  -  Pjd-a^)  -  P2(l-a2)      •  (B.14) 


This  completes  the  proof  of  PROPOSITION  3  of  Chapter  5  which  gives  the 
conditional  intensity  function  of  a  semi-Markov  process. 
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